Monday, July 6, 2020

Regular Expressions : Anchors and Wildcard ( why and when should we use)




Generally in regular expressions,

1. Anchors are used to specify the start and end of the string. There are two anchors are there in re.


‘^’ and ‘$’.


Ex1: 

The regular expression pattern ‘^01*0$’ will match any string that starts and end with zeroes with any number of 1s between them.



2. Wildcard is one special character in regular expressions that acts as a placeholder and can match any character in the given input string. It’s the ‘.’ (dot) character is also called the wildcard character.
 

' . '

Ex2: 

For example, the pattern ‘hap{1,}y’ matches ‘happy’, ‘happpy’, ‘happpy’ and so on. Here, we had specified that the letter ‘p’ should be present one or more times. But sometime, you don’t always know the letter that you want to repeat in the sentence. In such situations, you’ll need to use the wildcard.

Suppose, you’re asked to write a regex pattern that should match a string that starts with four characters, followed by three 0s and two 1s, followed by any two characters.


The valid strings can be abcd00011ft, jkds00011hf, etc.


The pattern that satisfies above condition would be


1. ‘.{4}0{3}1{2}.{2}’


2. ‘....00011..’


where the dot acts as a placeholder which means anything can sit on the place of the dot.








Regular Expression

Write a regular expression that matches any string that starts with one or more ‘1’s, followed by three or more ‘0’s, followed by any number of ones (zero or more), followed by ‘0’s (from one to seven), and then ends with either two or three ‘1’s.

code snippet: 

Test case-1;

string = '11000011000111'

Test case-2: 

string = '00001100011111'

# regex pattern

#pattern = '^1{1, }0{3, }1{0, }0{1,7}1(2|3)$'

pattern =  '^1+0{3,}1*0{1,7}1{2,3}$'

# check whether pattern is present in string or not

result = re.search(pattern, string)

# evaluate result

if result != None:
    print(True)
else:
    print(False)


Explanation:



The ‘^’ specifies the start of the string. 

‘$’ specifies the end of the string.

|- Means OR (Matches with any of the characters separated by it.

* -Any number of occurrences (including 0 occurrences) 

+  - One or more occurrences 

{} - Indicate number of occurrences of a preceding RE to match.





Element wise operation on LIST vs ARRAY

The use of arrays over lists: You can write  vectorised  code on numpy arrays, not on lists, which is  convenient to read and write, and con...