Regular expressions match patterns in text. Python’s re module provides powerful tools for validation, parsing, and text transformation.

Basic Patterns

  import re

text = "Contact: [email protected] or [email protected]"

# Search for first match
match = re.search(r'\w+@\w+\.\w+', text)
print(match.group())  # [email protected]

# Find all matches
emails = re.findall(r'\w+@\w+\.\w+', text)
# ['[email protected]', '[email protected]']

# Replace matches
clean = re.sub(r'\d{3}-\d{4}', 'XXXX', "Call 555-1234")
  

Common Metacharacters

Pattern Matches
. Any character (except newline)
\d Digit [0-9]
\D Non-digit
\w Word character [a-zA-Z0-9_]
\s Whitespace
[abc] Any of a, b, c
[^abc] Not a, b, or c
* 0 or more
+ 1 or more
? 0 or 1
{3} Exactly 3
{2,5} 2 to 5
^ Start of string
$ End of string

Compiling Patterns

For repeated use, compile once:

  pattern = re.compile(r'(\d{4})-(\d{2})-(\d{2})')

result = pattern.match("2024-06-15")
if result:
    year, month, day = result.groups()
    print(f"{year}/{month}/{day}")
  

Groups and Capturing

  log = "2024-06-15 10:30:00 ERROR Database connection failed"
pattern = r'(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) (\w+) (.+)'

match = re.match(pattern, log)
date, time, level, message = match.groups()
print(level)    # ERROR
print(message)  # Database connection failed
  

Named Groups

  pattern = r'(?P<date>\d{4}-\d{2}-\d{2}) (?P<level>\w+) (?P<msg>.+)'
match = re.match(pattern, log)
print(match.group("level"))  # ERROR
print(match.groupdict())
  

Validation Examples

  def is_valid_email(email):
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(pattern, email))

def is_valid_phone(phone):
    pattern = r'^\+?1?\d{9,15}$'
    return bool(re.match(pattern, phone))

def extract_urls(text):
    pattern = r'https?://[^\s<>"{}|\\^`\[\]]+'
    return re.findall(pattern, text)
  

Splitting with Regex

  re.split(r'[,;\s]+', "apple, banana; cherry  date")
# ['apple', 'banana', 'cherry', 'date']
  

Flags

  re.search(r'hello', "Hello World", re.IGNORECASE)
re.search(r'^start', "start here", re.MULTILINE)
re.search(r'tag', "<tag>content</tag>", re.DOTALL)
re.compile(r'\d+', re.VERBOSE)  # allow comments in pattern
  

Greedy vs Non-Greedy

  html = "<div>Hello</div><div>World</div>"

re.findall(r'<div>.*</div>', html)    # greedy — one big match
re.findall(r'<div>.*?</div>', html)  # non-greedy — two matches
  

Raw Strings

Always use raw strings (r"...") for regex patterns to avoid escaping backslashes:

  re.search(r'\d+', "123")   # correct
re.search('\\d+', "123")   # also works but harder to read
  

When NOT to Use Regex

  • Parsing HTML/XML — use BeautifulSoup or lxml
  • Parsing JSON — use json module
  • Complex nested structures — use a proper parser

Regex excels at pattern matching and simple extraction. For structured data, use the right tool.