Monday, January 25, 2021

Regular Expressions | Grouping

Regex: Grouping::

Suppose we have textual data with dates in it and we want to extract only the year from the dates. This can be done using regular expression pattern with grouping to match dates and then we can extract the component elements such as the day, month or the year from the date.


Example:

url = "http://www.telegraph.co.uk/formula-1/2017/10/28/mexican-grand-prix-2017-time-does-start-tv-channel-odds-lewisl/2017/05/12"

date_regex = '/(\d{4})/(\d{1,2})/(\d{1,2})/'

print(re.findall(date_regex, url))




Sunday, January 24, 2021

Time | Memory Taken to execute a piece of code in Python

 Time Taken to run the code:

Generally, it is very hard to find which part of code is taking more time when you run the whole application. The below code snippet tells us time taken to run the function.

Memory used by Objects:

In Python we use the sys.getsizeof function to check the memory used by an object.




Friday, January 22, 2021

Wildcard | Character Sets | Meta Sequences

 

Wildcard: It matches any characters from characters to numbers and alphanumeric variables.




Character Sets:

For example, say we want to match phone numbers in a large document. we know that the numbers may contain hyphens, plus symbol etc. (e.g. +91-9925417854) , but it will not have any alphabet. we need to specify that we need only for numerics and some other symbols, but avoid alphabets.

 To handle such situations, we can use character sets in regular expressions


Meta Sequences:

We commonly use sets to match only digits, only alphabets, only alphanumeric characters, only whitespaces, etc. Therefore, there is a shorthand way to write commonly used character sets in regular expressions. These are called as meta-sequences




Greedy vs Non Greedy search

 When we  use a regular expression to match a string, the regex greedily tries to look for the longest pattern possible in the string.
 Examaple:
when you specify the pattern 'ab{2,7}' to match the string 'abbbbbbb', it will look for the maximum number of occurrences of 'b' (in this case 7).This is called a 'greedy approach'. By default, a regular expression is greedy in nature.
There is another approach called the non-greedy approach, also called the lazy approach, where the regex stops looking for the pattern once a particular condition is satisfied.

Code Example: 




Element wise operation on LIST vs ARRAY

The use of arrays over lists: You can write  vectorised  code on numpy arrays, not on lists, which is  convenient to read and write, and con...