to navigate

to select

to close

On this page

Regular Expressions

Master Python regex with the re module — patterns, groups, lookaheads, substitution, and practical text processing examples.

Regular expressions match patterns in text. Python’s re module provides powerful tools for validation, parsing, and text transformation.

Basic Patterns

  import re

text = "Contact: [email protected] or [email protected]"

# Search for first match
match = re.search(r'\w+@\w+\.\w+', text)
print(match.group())  # [email protected]

# Find all matches
emails = re.findall(r'\w+@\w+\.\w+', text)
# ['[email protected]', '[email protected]']

# Replace matches
clean = re.sub(r'\d{3}-\d{4}', 'XXXX', "Call 555-1234")

Common Metacharacters

Pattern	Matches
`.`	Any character (except newline)
`\d`	Digit `[0-9]`
`\D`	Non-digit
`\w`	Word character `[a-zA-Z0-9_]`
`\s`	Whitespace
`[abc]`	Any of a, b, c
`[^abc]`	Not a, b, or c
`*`	0 or more
`+`	1 or more
`?`	0 or 1
`{3}`	Exactly 3
`{2,5}`	2 to 5
`^`	Start of string
`$`	End of string

Compiling Patterns

For repeated use, compile once:

  pattern = re.compile(r'(\d{4})-(\d{2})-(\d{2})')

result = pattern.match("2024-06-15")
if result:
    year, month, day = result.groups()
    print(f"{year}/{month}/{day}")

Groups and Capturing

  log = "2024-06-15 10:30:00 ERROR Database connection failed"
pattern = r'(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) (\w+) (.+)'

match = re.match(pattern, log)
date, time, level, message = match.groups()
print(level)    # ERROR
print(message)  # Database connection failed

Named Groups

  pattern = r'(?P<date>\d{4}-\d{2}-\d{2}) (?P<level>\w+) (?P<msg>.+)'
match = re.match(pattern, log)
print(match.group("level"))  # ERROR
print(match.groupdict())

Validation Examples

  def is_valid_email(email):
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(pattern, email))

def is_valid_phone(phone):
    pattern = r'^\+?1?\d{9,15}$'
    return bool(re.match(pattern, phone))

def extract_urls(text):
    pattern = r'https?://[^\s<>"{}|\\^`\[\]]+'
    return re.findall(pattern, text)

Splitting with Regex

  re.split(r'[,;\s]+', "apple, banana; cherry  date")
# ['apple', 'banana', 'cherry', 'date']

Flags

  re.search(r'hello', "Hello World", re.IGNORECASE)
re.search(r'^start', "start here", re.MULTILINE)
re.search(r'tag', "<tag>content</tag>", re.DOTALL)
re.compile(r'\d+', re.VERBOSE)  # allow comments in pattern

Greedy vs Non-Greedy

  html = "<div>Hello</div><div>World</div>"

re.findall(r'<div>.*</div>', html)    # greedy — one big match
re.findall(r'<div>.*?</div>', html)  # non-greedy — two matches

Raw Strings

Always use raw strings (r"...") for regex patterns to avoid escaping backslashes:

  re.search(r'\d+', "123")   # correct
re.search('\\d+', "123")   # also works but harder to read

When NOT to Use Regex

Parsing HTML/XML — use BeautifulSoup or lxml
Parsing JSON — use json module
Complex nested structures — use a proper parser

Regex excels at pattern matching and simple extraction. For structured data, use the right tool.

Decorators

Master Python decorators — function …

Python Ecosystem & Libraries

Navigate the Python ecosystem — data …

Regular Expressions

Basic Patterns link

Common Metacharacters link

Compiling Patterns link

Groups and Capturing link

Named Groups link

Validation Examples link

Splitting with Regex link

Flags link

Greedy vs Non-Greedy link

Raw Strings link

When NOT to Use Regex link