On this page
article
Regular Expressions
Master Python regex with the re module — patterns, groups, lookaheads, substitution, and practical text processing examples.
Regular expressions match patterns in text. Python’s re module provides powerful tools for validation, parsing, and text transformation.
Basic Patterns
import re
text = "Contact: [email protected] or [email protected]"
# Search for first match
match = re.search(r'\w+@\w+\.\w+', text)
print(match.group()) # [email protected]
# Find all matches
emails = re.findall(r'\w+@\w+\.\w+', text)
# ['[email protected]', '[email protected]']
# Replace matches
clean = re.sub(r'\d{3}-\d{4}', 'XXXX', "Call 555-1234")
Common Metacharacters
| Pattern | Matches |
|---|---|
. |
Any character (except newline) |
\d |
Digit [0-9] |
\D |
Non-digit |
\w |
Word character [a-zA-Z0-9_] |
\s |
Whitespace |
[abc] |
Any of a, b, c |
[^abc] |
Not a, b, or c |
* |
0 or more |
+ |
1 or more |
? |
0 or 1 |
{3} |
Exactly 3 |
{2,5} |
2 to 5 |
^ |
Start of string |
$ |
End of string |
Compiling Patterns
For repeated use, compile once:
pattern = re.compile(r'(\d{4})-(\d{2})-(\d{2})')
result = pattern.match("2024-06-15")
if result:
year, month, day = result.groups()
print(f"{year}/{month}/{day}")
Groups and Capturing
log = "2024-06-15 10:30:00 ERROR Database connection failed"
pattern = r'(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) (\w+) (.+)'
match = re.match(pattern, log)
date, time, level, message = match.groups()
print(level) # ERROR
print(message) # Database connection failed
Named Groups
pattern = r'(?P<date>\d{4}-\d{2}-\d{2}) (?P<level>\w+) (?P<msg>.+)'
match = re.match(pattern, log)
print(match.group("level")) # ERROR
print(match.groupdict())
Validation Examples
def is_valid_email(email):
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
return bool(re.match(pattern, email))
def is_valid_phone(phone):
pattern = r'^\+?1?\d{9,15}$'
return bool(re.match(pattern, phone))
def extract_urls(text):
pattern = r'https?://[^\s<>"{}|\\^`\[\]]+'
return re.findall(pattern, text)
Splitting with Regex
re.split(r'[,;\s]+', "apple, banana; cherry date")
# ['apple', 'banana', 'cherry', 'date']
Flags
re.search(r'hello', "Hello World", re.IGNORECASE)
re.search(r'^start', "start here", re.MULTILINE)
re.search(r'tag', "<tag>content</tag>", re.DOTALL)
re.compile(r'\d+', re.VERBOSE) # allow comments in pattern
Greedy vs Non-Greedy
html = "<div>Hello</div><div>World</div>"
re.findall(r'<div>.*</div>', html) # greedy — one big match
re.findall(r'<div>.*?</div>', html) # non-greedy — two matches
Raw Strings
Always use raw strings (r"...") for regex patterns to avoid escaping backslashes:
re.search(r'\d+', "123") # correct
re.search('\\d+', "123") # also works but harder to read
When NOT to Use Regex
- Parsing HTML/XML — use
BeautifulSouporlxml - Parsing JSON — use
jsonmodule - Complex nested structures — use a proper parser
Regex excels at pattern matching and simple extraction. For structured data, use the right tool.