Regular Expressions are sequences of characters that define search patterns used for string matching, validation, and text manipulation.
(RegEx), in Python’s re
module, offers functions to find, search, split, and replace patterns in strings.
Key RegEx Functions in Python
The re
module provides four primary functions for pattern matching:
findall
: Finds all matches in a string and returns them as a list.search
: Searches for the first match and returns a match object.split
: Splits a string at matches and returns a list.sub
: Replaces matches with a specified string.
1. findall()
Function
This function is useful when you need all occurrences of a pattern. It returns a list of matches.
import re
text = "The rain in Spain falls mainly in the plain."
matches = re.findall(r"\bin\b", text)
print(matches) # Output: ['in', 'in']
Explanation:
- The pattern
\bin\b
matches the word “in” as a whole word (\b
denotes a word boundary).
Challenge Example: Find all words starting with a vowel in a string.
text = "Umbrella, orange, apple, and elephant are here."
matches = re.findall(r"\b[AEIOUaeiou]\w*", text)
print(matches) # Output: ['Umbrella', 'orange', 'apple', 'elephant']
2. search()
Function
This function returns the first match it finds as a match object.
match = re.search(r"\bmain\w*\b", text)
if match:
print(match.group()) # Output: 'mainly'
Explanation:
- The pattern
\bmain\w*\b
matches words that start with “main”.
Challenge Example: Search for the first numeric pattern in a string.
text = "Order numbers are 12345 and 67890."
match = re.search(r"\d{5}", text)
if match:
print(match.group()) # Output: '12345'
3. split()
Function
This function splits a string wherever the pattern matches.
text = "Split this string at every vowel."
result = re.split(r"[AEIOUaeiou]", text)
print(result) # Output: ['Spl', 't th', 's str', 'ng ', 't ', 'v', 'ry v', 'w', 'l.']
Explanation:
- The pattern
[AEIOUaeiou]
matches any vowel.
Challenge Example: Split a string at sequences of digits.
text = "Item123Price456Code789"
result = re.split(r"\d+", text)
print(result) # Output: ['Item', 'Price', 'Code', '']
4. sub()
Function
This function replaces occurrences of a pattern with a specified string.
text = "Replace vowels with asterisks."
result = re.sub(r"[AEIOUaeiou]", "*", text)
print(result) # Output: 'R*pl*c* v*w*ls w*th *st*r*sks.'
Challenge Example: Replace all numeric sequences with the word “NUMBER”.
text = "123 Main St, Apt 456."
result = re.sub(r"\d+", "NUMBER", text)
print(result) # Output: 'NUMBER Main St, Apt NUMBER.'
Metacharacters in RegEx
Metacharacters are symbols with special meanings in patterns. Here are the most common ones:
| Character | Description | Example |
|-----------|--------------------------------------|-----------|
| [] | Matches any character in brackets | [a-m] |
| \ | Escapes special characters | \d |
| . | Matches any character except newline | he..o |
| ^ | Matches start of string | ^hello |
| $ | Matches end of string | planet$ |
| * | Matches zero or more occurrences | he.*o |
| + | Matches one or more occurrences | he.+o |
| ? | Matches zero or one occurrence | he.?o |
| {} | Matches exact occurrences | he.{2}o |
| | | Matches either or | a|b |
| () | Groups expressions | (abc) |
Example with Metacharacters
Extract valid email addresses from a text.
text = "Emails: test.email@domain.com, invalid-email@, user@site.org."
pattern = r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"
emails = re.findall(pattern, text)
print(emails) # Output: ['test.email@domain.com', 'user@site.org']
Special Sequences
Special sequences simplify complex patterns. Here are some key ones:
| Sequence | Description |
|----------|--------------------------------------------|
| \d | Matches any digit |
| \D | Matches non-digit characters |
| \s | Matches whitespace |
| \S | Matches non-whitespace |
| \w | Matches word characters (alphanumeric + _) |
| \W | Matches non-word characters |
| \b | Matches at word boundary |
| \B | Matches not at word boundary |
Example with Special Sequences
Find non-alphanumeric characters in a string.
text = "Find symbols! @#$ and numbers like 123."
symbols = re.findall(r"\W", text)
print(symbols) # Output: [' ', '!', ' ', '@', '#', '$', ' ', ' ', '.']
Sets in RegEx
Sets are enclosed in square brackets and define character groups.
Example with Sets
Find all uppercase letters and digits in a string.
text = "Data123ScienceABC!"
matches = re.findall(r"[A-Z0-9]", text)
print(matches) # Output: ['D', '1', '2', '3', 'S', 'A', 'B', 'C']
Breakout of Above Set:
Input String: text = “Data123ScienceABC!”
- Uppercase letters:
D
,S
,A
,B
,C
- Lowercase letters:
a
,t
,a
,c
,i
,e
,n
,c
,e
- Digits:
1
,2
,3
- A special character:
!
Regular Expression — The pattern [A-Z0-9]
is a set that matches:
A-Z
: Any uppercase letter fromA
toZ
.0-9
: Any digit from0
to9
.
Using re.findall()
: matches = re.findall(r”[A-Z0–9]”, text)
- The
re.findall()
function scans the input string from left to right and finds all occurrences of characters that match the pattern. - It adds each match to a list.
| Character | Does it match [A-Z0-9]? | Why |
|-----------|-------------------------|-----------------------------------|
| D | ✅ Yes | Uppercase letter. Matches A-Z. |
| a | ❌ No | Lowercase letter. Does not match. |
| t | ❌ No | Lowercase letter. Does not match. |
| a | ❌ No | Lowercase letter. Does not match. |
| 1 | ✅ Yes | Digit. Matches 0-9. |
| 2 | ✅ Yes | Digit. Matches 0-9. |
| 3 | ✅ Yes | Digit. Matches 0-9. |
| S | ✅ Yes | Uppercase letter. Matches A-Z. |
| c | ❌ No | Lowercase letter. Does not match. |
| i | ❌ No | Lowercase letter. Does not match. |
| e | ❌ No | Lowercase letter. Does not match. |
| n | ❌ No | Lowercase letter. Does not match. |
| c | ❌ No | Lowercase letter. Does not match. |
| e | ❌ No | Lowercase letter. Does not match. |
| A | ✅ Yes | Uppercase letter. Matches A-Z. |
| B | ✅ Yes | Uppercase letter. Matches A-Z. |
| C | ✅ Yes | Uppercase letter. Matches A-Z. |
| ! | ❌ No | Special character. Does not match.|
Match Objects
The search
function returns a match object, which contains detailed information about the match.
text = "Contact: 123-456-7890"
match = re.search(r"\d{3}-\d{3}-\d{4}", text)
if match:
print(match.group()) # Output: '123-456-7890'
print(match.start()) # Output: 9
print(match.end()) # Output: 21
Breakout of Above Set:
Input String: text = “Contact: 123–456–7890”
- The string contains a phone number formatted as
123-456-7890
.
Regex Pattern: r”\d{3}-\d{3}-\d{4}”
\d
: Matches any digit (0–9).{3}
: Matches exactly 3 occurrences of the preceding element (\d
).-
: Matches a literal hyphen.- Combined, the pattern matches the format of a U.S. phone number:
XXX-XXX-XXXX
.
Search for the Pattern: match = re.search(r”\d{3}-\d{3}-\d{4}”, text)
re.search()
scans the string for the first match of the pattern.- If a match is found, it returns a match object.
- If no match is found, it returns
None
.
Check for a Match: if match:
- Ensures the code inside the block only runs if a match is found.
Accessing Match Information:
match.group()
: Returns the exact substring that matched the pattern (123-456-7890
).match.start()
: Returns the starting index of the match in the string (index 9).match.end()
: Returns the ending index of the match in the string (index 21, exclusive).
| Character or Substring | Does it match `\d{3}-\d{3}-\d{4}`? | Why |
|--------------------------|------------------------------------|-------------------------------------------------------------------------|
| Contact: | ❌ No | It does not match the pattern for a phone number. |
| 123 | ❌ No | Matches `\d{3}` but is not followed by `-` and the rest of the pattern. |
| 123- | ❌ No | Matches part of the pattern but not the entire phone number format. |
| 123-456 | ❌ No | Matches the first two parts but not the last 4 digits. |
| 123-456-7890 | ✅ Yes | Matches the complete phone number format: `\d{3}-\d{3}-\d{4}`. |
| 7890 | ❌ No | Matches `\d{4}` but is missing the preceding parts of the pattern. |
Thank you for reading this article. I hope you found it helpful and informative. If you have any questions, or if you would like to suggest new Python code examples or topics for future tutorials, please feel free to reach out. Your feedback and suggestions are always welcome!
Happy coding!
C. C. Python Programming
You can also find this article at Medium.com