Published Sept. 7, 2023, 3:26 p.m.
OUTLINE:
re
Modulecompile()
finditer()
match()
fullmatch()
search()
findall()
finditer()
sub()
subn()
split()
Are you ready to dive into the fascinating world of Regular Expressions in Python? 🚀 Let's explore this powerful tool that helps you represent and manipulate text patterns effortlessly.
Regular Expressions, often referred to as regex or regexp, are declarative mechanisms for representing and manipulating text patterns. 📜 They're like magic spells for text, allowing you to find, match, and manipulate strings based on specific patterns.
1️⃣ You can write a regular expression to represent all mobile numbers. 2️⃣ You can write a regular expression to represent all email addresses.
ctrl-f
in Windows and grep
in UNIX.re
Module 🐍Python provides the re
module, which offers several built-in functions to work with regular expressions effortlessly.
compile()
: Compiles a pattern into a RegexObject.finditer()
: Returns an iterator yielding Match objects for every match.start()
: Returns the start index of the match.end()
: Returns the end index + 1 of the match.group()
: Returns the matched string.import re
pattern = re.compile("ab")
matcher = pattern.finditer("abaababa")
count = 0
for match in matcher:
count += 1
print(match.start(), "...", match.end(), "...", match.group())
print("The number of occurrences:", count)
Output:
0 ... 2 ... ab
3 ... 5 ... ab
5 ... 7 ... ab
The number of occurrences: 3
Character classes allow you to search for groups of characters:
[abc]
: Matches either 'a', 'b', or 'c'.[^abc]
: Matches any character except 'a', 'b', or 'c'.[a-z]
: Matches any lowercase alphabet.[A-Z]
: Matches any uppercase alphabet.[a-zA-Z]
: Matches any alphabet character.[0-9]
: Matches any digit from 0 to 9.[a-zA-Z0-9]
: Matches any alphanumeric character.[^a-zA-Z0-9]
: Matches any special character (non-alphanumeric).Character Classes 🧩
[abc]
: Matches either 'a', 'b', or 'c'.Example:
import re
pattern = re.compile("[abc]")
result = pattern.search("The apple is on the table.")
if result:
print(result.group()) # Output: 'a'
[^abc]
: Matches any character except 'a', 'b', or 'c'.Example:
import re
pattern = re.compile("[^abc]")
result = pattern.search("The apple is on the table.")
if result:
print(result.group()) # Output: 'T' (matches the first non-'abc' character)
[a-z]
: Matches any lowercase alphabet.Example:
import re
pattern = re.compile("[a-z]")
result = pattern.search("The Quick Brown Fox")
if result:
print(result.group()) # Output: 'h' (matches the first lowercase letter)
[A-Z]
: Matches any uppercase alphabet.Example:
import re
pattern = re.compile("[A-Z]")
result = pattern.search("The Quick Brown Fox")
if result:
print(result.group()) # Output: 'T' (matches the first uppercase letter)
[a-zA-Z]
: Matches any alphabet character.Example:
import re
pattern = re.compile("[a-zA-Z]")
result = pattern.search("12345 Hello World!")
if result:
print(result.group()) # Output: 'H' (matches the first alphabet character)
[0-9]
: Matches any digit from 0 to 9.Example:
import re
pattern = re.compile("[0-9]")
result = pattern.search("The price is $25.99")
if result:
print(result.group()) # Output: '2' (matches the first digit)
[a-zA-Z0-9]
: Matches any alphanumeric character.Example:
import re
pattern = re.compile("[a-zA-Z0-9]")
result = pattern.search("User123 is online!")
if result:
print(result.group()) # Output: 'U' (matches the first alphanumeric character)
[^a-zA-Z0-9]
: Matches any special character (non-alphanumeric).Example:
import re
pattern = re.compile("[^a-zA-Z0-9]")
result = pattern.search("Hello World!")
if result:
print(result.group()) # Output: ' ' (matches the first non-alphanumeric character, which is a space)
Explore these character classes in your regex patterns to customize your searches. 🕵️♀️
Use predefined character classes to simplify your patterns:
\s
: Matches any space character.\S
: Matches any character except a space.\d
: Matches any digit from 0 to 9.\D
: Matches any character except a digit.\w
: Matches any word character (letters, digits, or underscore).\W
: Matches any character except a word character (special characters)..
: Matches any character, including special characters.These shortcuts are incredibly useful for common patterns like spaces, digits, and more! 🧙♂️
Here are examples for each of the predefined character classes you mentioned:
\s
: Matches any space character.import re
text = "Hello World"
matches = re.findall(r"\s", text)
print(matches) # Output: [' ']
\S
: Matches any character except a space.import re
text = "Hello World"
matches = re.findall(r"\S", text)
print(matches) # Output: ['H', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd']
\d
: Matches any digit from 0 to 9.import re
text = "The price is $25.99"
matches = re.findall(r"\d", text)
print(matches) # Output: ['2', '5', '9', '9']
\D
: Matches any character except a digit.import re
text = "The
price is $25.99"
matches = re.findall(r"\D", text)
print(matches) # Output: ['T', 'h', 'e', ' ', 'p', 'r', 'i', 'c', 'e', ' ', 'i', 's', ' ', '$', '.']
\w
: Matches any word character (letters, digits, or underscore).import re
text = "User123 is online!"
matches = re.findall(r"\w", text)
print(matches) # Output: ['U', 's', 'e', 'r', '1', '2', '3', 'i', 's', 'o', 'n', 'l', 'i', 'n', 'e']
\W
: Matches any character except a word character (special characters).import re
text = "User123 is online!"
matches = re.findall(r"\W", text)
print(matches) # Output: [' ', ' ', '!']
.
: Matches any character, including special characters.import re
text = "Hello World!"
matches = re.findall(r".", text)
print(matches) # Output: ['H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '!']
These examples demonstrate how to use predefined character classes in regular expressions to match specific types of characters or character groups in text strings.
Quantifiers help you specify the number of occurrences to match:
a
: Matches exactly one 'a'.a+
: Matches at least one 'a'.a*
: Matches any number of 'a's (including zero).a?
: Matches at most one 'a' (zero or one occurrence).a{m}
: Matches exactly 'm' occurrences of 'a'.a{m,n}
: Matches between 'm' and 'n' occurrences of 'a'.With quantifiers, you can fine-tune your regex patterns for precise matches. 🧰 Here are examples for each of the quantifiers you mentioned:
a
: Matches exactly one 'a'.import re
text = "An apple a day keeps the doctor away."
matches = re.findall(r"a", text)
print(matches) # Output: ['a', 'a', 'a', 'a']
a+
: Matches at least one 'a'.import re
text = "She saw a beautiful sunset."
matches = re.findall(r"a+", text)
print(matches) # Output: ['a', 'a', 'a']
a*
: Matches any number of 'a's (including zero).import re
text = "The cat sat on the mat."
matches = re.findall(r"a*", text)
print(matches) # Output: ['', '', '', '', '', 'a', '', '', '', 'a', '', '', '', 'a', '', '', '', '']
a?
: Matches at most one 'a' (zero or one occurrence).import re
text = "Color or colour, choose your favorite."
matches = re.findall(r"colou?r", text)
print(matches) # Output: ['color', 'colour']
a{m}
: Matches exactly 'm' occurrences of 'a'.import re
text = "She walked along the road."
matches = re.findall(r"a{2}", text)
print(matches) # Output: ['aa']
a{m,n}
: Matches between 'm' and 'n' occurrences of 'a'.import re
text = "The meeting is scheduled for aaamorningaaa."
matches = re.findall(r"a{2,3}", text)
print(matches) # Output: ['aaa', 'aa', 'aaa']
These examples illustrate how to use quantifiers in regular expressions to match specific numbers of occurrences of a character or a group of characters within a text string.
re
Module 📚match()
: Checks if a pattern matches at the beginning of a target string.fullmatch()
: Checks if a pattern matches the entire target string.search()
: Searches for the pattern anywhere in the target string.findall()
: Finds all occurrences of the pattern.finditer()
: Returns an iterator with Match objects for each match.sub()
: Replaces matched patterns with a specified string.subn()
: Similar to sub()
, but also returns the number of replacements.split()
: Splits a string based on a pattern.compile()
: Compiles a pattern into a RegexObject.compile()
:import re
pattern = re.compile("ab")
matcher = pattern.finditer("abaababa")
count = 0
for match in matcher:
count += 1
print(match.start(), "...", match.end(), "...", match.group())
print("The number of occurrences:", count)
finditer()
:import re
pattern = re.compile("ab")
matcher = pattern.finditer("abaababa")
count = 0
for match in matcher:
count += 1
print(match.start(), "...", match.end(), "...", match.group())
print("The number of occurrences:", count)
match()
:import re
s = "abcabdefg"
m = re.match("abc", s)
if m is not None:
print("Match is available at the beginning of the String")
print("Start Index:", m.start(), "and End Index:", m.end())
else:
print("Match is not available at the beginning of the String")
fullmatch()
:import re
s = "ababab"
m = re.fullmatch("ababab", s)
if m is not None:
print("Full String Matched")
else:
print("Full String not Matched")
search()
:import re
s = "abaaaba"
m = re.search("aaa", s)
if m is not None:
print("Match is available")
print("First Occurrence of match with start index:", m.start(), "and end index:", m.end())
else:
print("Match is not available")
findall()
:import re
text = "My phone number is 1234567890, and my friend's number is 9876543210."
numbers = re.findall("[7-9]\d{9}", text)
print(numbers)
finditer()
:import re
text = "My phone number is 1234567890, and my friend's number is 9876543210."
matcher = re.finditer("[7-9]\d{9}", text)
for match in matcher:
print(match.start(), "...",
match.end(), "...", match.group())
sub()
:import re
text = "My phone number is 1234567890."
new_text = re.sub("\d{10}", "XXXXXXXXXX", text)
print(new_text)
subn()
:import re
text = "My phone number is 1234567890."
new_text, replacements = re.subn("\d{10}", "XXXXXXXXXX", text)
print("Result String:", new_text)
print("The number of replacements:", replacements)
These functions are your toolbox for regex operations in Python. 🧰🐍
import re
def is_valid_yava_identifier(identifier):
pattern = re.compile("[a-k][0369][a-zA-Z0-9#]*")
if pattern.fullmatch(identifier):
return True
else:
return False
identifier1 = "a6kk9z##"
identifier2 = "k9b876"
identifier3 = "k7b9"
print(f"{identifier1} is {'valid' if is_valid_yava_identifier(identifier1) else 'invalid'} Yava Identifier")
print(f"{identifier2} is {'valid' if is_valid_yava_identifier(identifier2) else 'invalid'} Yava Identifier")
print(f"{identifier3} is {'valid' if is_valid_yava_identifier(identifier3) else 'invalid'} Yava Identifier")
import re
def is_valid_mobile_number(number):
pattern = re.compile("[7-9]\d{9}")
if pattern.fullmatch(number):
return True
else:
return False
number1 = "9898989898"
number2 = "6786786787"
number3 = "898989"
print(f"{number1} is {'valid' if is_valid_mobile_number(number1) else 'invalid'} Mobile Number")
print(f"{number2} is {'valid' if is_valid_mobile_number(number2) else 'invalid'} Mobile Number")
print(f"{number3} is {'valid' if is_valid_mobile_number(number3) else 'invalid'} Mobile Number")
import re
with open("input.txt", "r") as f1, open("output.txt", "w") as f2:
for line in f1:
numbers = re.findall("[7-9]\d{9}", line)
for n in numbers:
f2.write(n + "\n")
print("Extracted all Mobile Numbers into output.txt")
import re
import urllib.request
sites = ["google", "rediff"]
for s in sites:
print("Searching...", s)
u = urllib.request.urlopen("http://" + s + ".com")
text = u.read()
title = re.findall("<title>.*</title>", str(text), re.I)
print(title[0])
import re
def is_valid_gmail_address(email):
pattern = re.compile(r"\w[a-zA-Z0-9_.]*@gmail[.]com")
if pattern.fullmatch(email):
return True
else:
return False
email1 = "durgatoc@gmail.com"
email2 = "durgatoc"
print(f"{email1} is {'valid' if is_valid_gmail_address(email1) else 'invalid'} Gmail Address")
print(f"{email2} is {'valid' if is_valid_gmail_address(email2) else 'invalid'} Gmail Address")
import re
def is_valid_telangana_vehicle_registration(registration_number):
pattern = re.compile("TS[012][0-9][A-Z]{2}\d{4}")
if pattern.fullmatch(registration_number):
return True
else:
return False
registration1 = "TS07EA7777"
registration2 = "TS07KF0786"
registration3 = "AP07EA7898"
print(f"{registration1} is {'valid' if is_valid_telangana_vehicle_registration(registration1) else 'invalid'} Telangana Vehicle Registration")
print(f"{registration2} is {'valid' if is_valid_telangana_vehicle_registration(registration2) else 'invalid'} Telangana Vehicle Registration")
print(f"{registration3} is {'valid' if is_valid_telangana_vehicle_registration(registration3) else 'invalid'} Telangana Vehicle Registration")
import re
def is_valid_flexible_mobile_number(number):
pattern = re.compile("(0|91)?[7-9][0-9]{9}")
if pattern.fullmatch(number):
return True
else:
return
False
number1 = "9898989898"
number2 = "918989898989"
number3 = "6786786787"
print(f"{number1} is {'valid' if is_valid_flexible_mobile_number(number1) else 'invalid'} Mobile Number")
print(f"{number2} is {'valid' if is_valid_flexible_mobile_number(number2) else 'invalid'} Mobile Number")
print(f"{number3} is {'valid' if is_valid_flexible_mobile_number(number3) else 'invalid'} Mobile Number")
These examples cover a wide range of tasks you can accomplish with regular expressions in Python. 🧩✨ These practical examples will boost your regex skills and empower you to tackle real-world tasks. 🛠️ Now, armed with this regex knowledge, you can conquer text manipulation challenges like a pro! 🏆 Happy coding! 🚀🐍