Python's re module is a powerful tool for working with regular expressions. Whether you're doing data cleaning, text analysis, or form validation, mastering the re module will significantly boost your productivity. This guide will take you from zero to hero, covering everything you need to know about Python regular expressions.

Why Choose Python's re Module?

Python's re module offers complete regular expression support with several advantages:

Built-in module, no installation required
Clean, intuitive syntax
Excellent performance for large text processing
Rich function library for various needs
Seamless integration with other Python modules

re Module Basics

Importing the Module

Import the module before use:

import re

Core Functions Overview

The re module provides multiple functions, each with specific purposes:

re.match() - Match from the beginning of the string
re.search() - Search for the first match anywhere in the string
re.findall() - Find all matching occurrences
re.finditer() - Return an iterator of all matches
re.sub() - Replace matching text
re.split() - Split string by pattern
re.compile() - Compile regex pattern (improves performance)

Detailed Function Usage

1. re.match() - Match from Beginning

match() only checks if the pattern matches the beginning of the string:

import re

text = "Hello, World!"
pattern = r"Hello"

result = re.match(pattern, text)
if result:
    print("Match found:", result.group())  # Output: Hello
else:
    print("Match failed")

# No match case
result = re.match(r"World", text)  # Returns None

Use case: Validate user input against specific formats, such as email or phone numbers.

2. re.search() - Search First Match

search() looks for the first match anywhere in the string:

text = "Python is awesome! Python is powerful!"
pattern = r"Python"

result = re.search(pattern, text)
if result:
    print("Found:", result.group())  # Output: Python
    print("Position:", result.start())  # Output: 0

Use case: Extract key information from log files, find specific errors or warnings.

3. re.findall() - Find All Matches

findall() returns a list of all matching occurrences:

text = "My phone: 138-1234-5678, yours: 139-8765-4321"
pattern = r"\d{3}-\d{4}-\d{4}"

phone_numbers = re.findall(pattern, text)
print(phone_numbers)  # Output: ['138-1234-5678', '139-8765-4321']

Use case: Batch extract data, such as all emails, links, or image URLs from a webpage.

4. re.finditer() - Get Detailed Match Info

finditer() returns an iterator of match objects with more information:

text = "Email: [email protected], [email protected], [email protected]"
pattern = r"[\w.+-]+@[\w-]+\.[\w.-]+"

for match in re.finditer(pattern, text):
    print(f"Email: {match.group()}, Start: {match.start()}, End: {match.end()}")

Use case: When you need to know the exact position of each match.

5. re.sub() - Powerful Replacement

sub() can replace all matching occurrences:

# Simple replacement
text = "Hello, Hello, Hello"
result = re.sub(r"Hello", "Hi", text)
print(result)  # Output: Hi, Hi, Hi

# Using callback function
text = "Price: 100, 200, 300"
def discount(match):
    price = int(match.group())
    return f"{price * 0.9}元"

result = re.sub(r"\d+", discount, text)
print(result)  # Output: Price: 90.0元, 180.0元, 270.0元

# Limit replacement count
text = "a-a-a-a"
result = re.sub(r"a", "b", text, count=2)
print(result)  # Output: b-b-a-a

Use case: Batch modify text format, such as unifying date formats or cleaning special characters.

6. re.split() - Flexible String Splitting

split() splits a string based on regex pattern:

# Split by multiple delimiters
text = "apple,banana;orange|grape"
result = re.split(r"[,;|]", text)
print(result)  # Output: ['apple', 'banana', 'orange', 'grape']

# Keep delimiters
text = "apple  banana  orange"
result = re.split(r"(\s+)", text)
print(result)  # Output: ['apple', '  ', 'banana', '  ', 'orange']

# Limit split count
text = "one,two,three,four"
result = re.split(r",", text, maxsplit=2)
print(result)  # Output: ['one', 'two', 'three,four']

Use case: Parse complex text formats, like log files or configuration files.

7. re.compile() - Improve Performance

If using the same pattern multiple times, compile it first for better performance:

# Not compiled (re-parses pattern every time)
pattern = r"\b\w+\b"
text = "This is a test"

for _ in range(1000):
    words = re.findall(pattern, text)

# Compiled (only parses once)
compiled_pattern = re.compile(r"\b\w+\b")
for _ in range(1000):
    words = compiled_pattern.findall(text)  # Faster!

Use case: When using the same regex pattern in loops or frequent calls.

Regex Pattern Details

Basic Patterns

# Character classes
pattern = r"[a-z]"          # Match any lowercase letter
pattern = r"[A-Z0-9]"       # Match uppercase letter or digit
pattern = r"[^0-9]"         # Match non-digit

# Predefined character classes
pattern = r"\d"             # Digit: [0-9]
pattern = r"\D"             # Non-digit: [^0-9]
pattern = r"\w"             # Alphanumeric: [a-zA-Z0-9_]
pattern = r"\W"             # Non-alphanumeric: [^a-zA-Z0-9_]
pattern = r"\s"             # Whitespace character
pattern = r"\S"             # Non-whitespace character

# Quantifiers
pattern = r"a*"             # 0 or more times
pattern = r"a+"             # 1 or more times
pattern = r"a?"             # 0 or 1 time
pattern = r"a{3}"           # Exactly 3 times
pattern = r"a{2,5}"         # 2 to 5 times
pattern = r"a{2,}"          # At least 2 times

Boundary Matching

text = "hello world hello"

# ^ Match string start
re.search(r"^hello", text)   # Matches first hello

# $ Match string end
re.search(r"hello$", text)   # Matches last hello

# \b Match word boundary
re.findall(r"\bhello\b", text)  # Only matches standalone hello

Grouping and Capturing

# Capture groups
text = "My birthday: 1990-05-15"
pattern = r"(\d{4})-(\d{2})-(\d{2})"
match = re.search(pattern, text)
if match:
    year = match.group(1)   # 1990
    month = match.group(2)  # 05
    day = match.group(3)    # 15

# Named groups
text = "Name: 张三, Age: 25"
pattern = r"Name: (?P<name>\w+), Age: (?P<age>\d+)"
match = re.search(pattern, text)
if match:
    print(match.group('name'))  # 张三
    print(match.group('age'))   # 25

# Non-capturing groups
pattern = r"(?:apple|banana|orange)"  # Group but don't capture

Practical Examples

Example 1: Validate Email Address

import re

def validate_email(email):
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(pattern, email))

# Test
print(validate_email("[email protected]"))      # True
print(validate_email("invalid.email"))          # False
print(validate_email("test@domain"))            # False

Example 2: Extract Web Links

import re

html = """
<a href="https://example.com">Link 1</a>
<a href="http://site.org/page">Link 2</a>
<a href="/relative/path">Link 3</a>
"""

pattern = r'href=["\']([^"\']+)["\']'
links = re.findall(pattern, html)
print(links)
# Output: ['https://example.com', 'http://site.org/page', '/relative/path']

Example 3: Clean Text

import re

text = "This    is   a  very   messy   text！！！！！"
# Remove extra spaces and punctuation
cleaned = re.sub(r'\s+', ' ', text)
cleaned = re.sub(r'！+', '！', cleaned)
print(cleaned)  # Output: This is a very messy text！

Example 4: Log Analysis

import re

log = """
2024-01-25 10:30:45 [INFO] User login successful
2024-01-25 10:31:20 [ERROR] Database connection failed
2024-01-25 10:32:10 [INFO] Data saved
2024-01-25 10:33:05 [WARNING] Memory usage at 85%
"""

# Extract error logs
error_pattern = r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} \[ERROR\] (.+)'
errors = re.findall(error_pattern, log)
print(errors)  # Output: ['Database connection failed']

Example 5: Data Extraction

import re

text = """
Order #1001: Apple x 2 = ¥10.00
Order #1002: Banana x 3 = ¥15.00
Order #1003: Orange x 1 = ¥8.00
"""

pattern = r'Order #(\d+): (\w+) x (\d+) = ¥(\d+\.\d+)'
orders = re.findall(pattern, text)

for order in orders:
    order_id, product, quantity, price = order
    print(f"Order ID: {order_id}, Product: {product}, Quantity: {quantity}, Price: {price}")

Best Practices

1. Use Raw Strings

# Good practice
pattern = r"\d{3}-\d{4}"

# Bad practice
pattern = "\\d{3}-\\d{4}"

2. Compile Common Patterns

# Compile if used multiple times
EMAIL_PATTERN = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')

def is_valid_email(email):
    return bool(EMAIL_PATTERN.match(email))

3. Use Named Groups

# Good practice
pattern = r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"

# Bad practice
pattern = r"(\d{4})-(\d{2})-(\d{2})"  # Need to remember indices

4. Handle Match Failures

match = re.search(pattern, text)
if match:
    # Process match result
    result = match.group(1)
else:
    # Handle no match case
    print("No match found")

5. Use Appropriate Functions

Only need to check if it matches: re.search() or re.match()
Need all matches: re.findall()
Need position info: re.finditer()
Need to replace: re.sub()
Need to split: re.split()

Common Pitfalls

1. Greedy vs Non-greedy

text = "<div>content1</div><div>content2</div>"

# Greedy match (default)
greedy = re.search(r'<div>.*</div>', text)
print(greedy.group())  # Matches entire string

# Non-greedy match
lazy = re.search(r'<div>.*?</div>', text)
print(lazy.group())  # Only matches first <div>

2. Escape Special Characters

# Characters to escape: . ^ $ * + ? { } [ ] \ | ( )
pattern = r"\.com"   # Match literal .com
pattern = r"\$"      # Match literal $

3. Chinese Character Handling

# Match Chinese characters
text = "你好世界123"
pattern = r"[\u4e00-\u9fa5]+"  # Match Chinese characters
chinese = re.findall(pattern, text)
print(chinese)  # Output: ['你好世界']

# Note: In Python 3, strings support Unicode by default

Performance Tips

Compile frequently used patterns: Use re.compile() for pre-compilation
Avoid overusing .*: Use more specific patterns
Use non-greedy matching: .*? instead of .*
Avoid unnecessary grouping: Use (?:...) for non-capturing groups
Use character classes: [abc] is faster than multiple | operators

Summary

Python's re module is powerful and easy to use. After mastering these techniques, you can:

Efficiently process text data
Validate user input
Extract key information
Batch modify text
Analyze log files

Remember: Practice is the best teacher. Write more code, try different patterns, and you'll soon become a regex expert!

Use our online Regex Tester to practice and test your regex patterns with immediate results!

Why Choose Python's re Module?

Python's re module offers complete regular expression support with several advantages:

Built-in module, no installation required
Clean, intuitive syntax
Excellent performance for large text processing
Rich function library for various needs
Seamless integration with other Python modules

re Module Basics

Importing the Module

Import the module before use:

import re

Core Functions Overview

The re module provides multiple functions, each with specific purposes:

re.match() - Match from the beginning of the string
re.search() - Search for the first match anywhere in the string
re.findall() - Find all matching occurrences
re.finditer() - Return an iterator of all matches
re.sub() - Replace matching text
re.split() - Split string by pattern
re.compile() - Compile regex pattern (improves performance)

Detailed Function Usage

1. re.match() - Match from Beginning

match() only checks if the pattern matches the beginning of the string:

import re

text = "Hello, World!"
pattern = r"Hello"

result = re.match(pattern, text)
if result:
    print("Match found:", result.group())  # Output: Hello
else:
    print("Match failed")

# No match case
result = re.match(r"World", text)  # Returns None

Use case: Validate user input against specific formats, such as email or phone numbers.

2. re.search() - Search First Match

search() looks for the first match anywhere in the string:

text = "Python is awesome! Python is powerful!"
pattern = r"Python"

result = re.search(pattern, text)
if result:
    print("Found:", result.group())  # Output: Python
    print("Position:", result.start())  # Output: 0

Use case: Extract key information from log files, find specific errors or warnings.

3. re.findall() - Find All Matches

findall() returns a list of all matching occurrences:

text = "My phone: 138-1234-5678, yours: 139-8765-4321"
pattern = r"\d{3}-\d{4}-\d{4}"

phone_numbers = re.findall(pattern, text)
print(phone_numbers)  # Output: ['138-1234-5678', '139-8765-4321']

Use case: Batch extract data, such as all emails, links, or image URLs from a webpage.

4. re.finditer() - Get Detailed Match Info

finditer() returns an iterator of match objects with more information:

text = "Email: [email protected], [email protected], [email protected]"
pattern = r"[\w.+-]+@[\w-]+\.[\w.-]+"

for match in re.finditer(pattern, text):
    print(f"Email: {match.group()}, Start: {match.start()}, End: {match.end()}")

Use case: When you need to know the exact position of each match.

5. re.sub() - Powerful Replacement

sub() can replace all matching occurrences:

# Simple replacement
text = "Hello, Hello, Hello"
result = re.sub(r"Hello", "Hi", text)
print(result)  # Output: Hi, Hi, Hi

# Using callback function
text = "Price: 100, 200, 300"
def discount(match):
    price = int(match.group())
    return f"{price * 0.9}元"

result = re.sub(r"\d+", discount, text)
print(result)  # Output: Price: 90.0元, 180.0元, 270.0元

# Limit replacement count
text = "a-a-a-a"
result = re.sub(r"a", "b", text, count=2)
print(result)  # Output: b-b-a-a

Use case: Batch modify text format, such as unifying date formats or cleaning special characters.

6. re.split() - Flexible String Splitting

split() splits a string based on regex pattern:

# Split by multiple delimiters
text = "apple,banana;orange|grape"
result = re.split(r"[,;|]", text)
print(result)  # Output: ['apple', 'banana', 'orange', 'grape']

# Keep delimiters
text = "apple  banana  orange"
result = re.split(r"(\s+)", text)
print(result)  # Output: ['apple', '  ', 'banana', '  ', 'orange']

# Limit split count
text = "one,two,three,four"
result = re.split(r",", text, maxsplit=2)
print(result)  # Output: ['one', 'two', 'three,four']

Use case: Parse complex text formats, like log files or configuration files.

7. re.compile() - Improve Performance

If using the same pattern multiple times, compile it first for better performance:

# Not compiled (re-parses pattern every time)
pattern = r"\b\w+\b"
text = "This is a test"

for _ in range(1000):
    words = re.findall(pattern, text)

# Compiled (only parses once)
compiled_pattern = re.compile(r"\b\w+\b")
for _ in range(1000):
    words = compiled_pattern.findall(text)  # Faster!

Use case: When using the same regex pattern in loops or frequent calls.

Regex Pattern Details

Basic Patterns

# Character classes
pattern = r"[a-z]"          # Match any lowercase letter
pattern = r"[A-Z0-9]"       # Match uppercase letter or digit
pattern = r"[^0-9]"         # Match non-digit

# Predefined character classes
pattern = r"\d"             # Digit: [0-9]
pattern = r"\D"             # Non-digit: [^0-9]
pattern = r"\w"             # Alphanumeric: [a-zA-Z0-9_]
pattern = r"\W"             # Non-alphanumeric: [^a-zA-Z0-9_]
pattern = r"\s"             # Whitespace character
pattern = r"\S"             # Non-whitespace character

# Quantifiers
pattern = r"a*"             # 0 or more times
pattern = r"a+"             # 1 or more times
pattern = r"a?"             # 0 or 1 time
pattern = r"a{3}"           # Exactly 3 times
pattern = r"a{2,5}"         # 2 to 5 times
pattern = r"a{2,}"          # At least 2 times

Boundary Matching

text = "hello world hello"

# ^ Match string start
re.search(r"^hello", text)   # Matches first hello

# $ Match string end
re.search(r"hello$", text)   # Matches last hello

# \b Match word boundary
re.findall(r"\bhello\b", text)  # Only matches standalone hello

Grouping and Capturing

# Capture groups
text = "My birthday: 1990-05-15"
pattern = r"(\d{4})-(\d{2})-(\d{2})"
match = re.search(pattern, text)
if match:
    year = match.group(1)   # 1990
    month = match.group(2)  # 05
    day = match.group(3)    # 15

# Named groups
text = "Name: 张三, Age: 25"
pattern = r"Name: (?P<name>\w+), Age: (?P<age>\d+)"
match = re.search(pattern, text)
if match:
    print(match.group('name'))  # 张三
    print(match.group('age'))   # 25

# Non-capturing groups
pattern = r"(?:apple|banana|orange)"  # Group but don't capture

Practical Examples

Example 1: Validate Email Address

import re

def validate_email(email):
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(pattern, email))

# Test
print(validate_email("[email protected]"))      # True
print(validate_email("invalid.email"))          # False
print(validate_email("test@domain"))            # False

Example 2: Extract Web Links

import re

html = """
<a href="https://example.com">Link 1</a>
<a href="http://site.org/page">Link 2</a>
<a href="/relative/path">Link 3</a>
"""

pattern = r'href=["\']([^"\']+)["\']'
links = re.findall(pattern, html)
print(links)
# Output: ['https://example.com', 'http://site.org/page', '/relative/path']

Example 3: Clean Text

import re

text = "This    is   a  very   messy   text！！！！！"
# Remove extra spaces and punctuation
cleaned = re.sub(r'\s+', ' ', text)
cleaned = re.sub(r'！+', '！', cleaned)
print(cleaned)  # Output: This is a very messy text！

Example 4: Log Analysis

import re

log = """
2024-01-25 10:30:45 [INFO] User login successful
2024-01-25 10:31:20 [ERROR] Database connection failed
2024-01-25 10:32:10 [INFO] Data saved
2024-01-25 10:33:05 [WARNING] Memory usage at 85%
"""

# Extract error logs
error_pattern = r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} \[ERROR\] (.+)'
errors = re.findall(error_pattern, log)
print(errors)  # Output: ['Database connection failed']

Example 5: Data Extraction

import re

text = """
Order #1001: Apple x 2 = ¥10.00
Order #1002: Banana x 3 = ¥15.00
Order #1003: Orange x 1 = ¥8.00
"""

pattern = r'Order #(\d+): (\w+) x (\d+) = ¥(\d+\.\d+)'
orders = re.findall(pattern, text)

for order in orders:
    order_id, product, quantity, price = order
    print(f"Order ID: {order_id}, Product: {product}, Quantity: {quantity}, Price: {price}")

Best Practices

1. Use Raw Strings

# Good practice
pattern = r"\d{3}-\d{4}"

# Bad practice
pattern = "\\d{3}-\\d{4}"

2. Compile Common Patterns

# Compile if used multiple times
EMAIL_PATTERN = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')

def is_valid_email(email):
    return bool(EMAIL_PATTERN.match(email))

3. Use Named Groups

# Good practice
pattern = r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"

# Bad practice
pattern = r"(\d{4})-(\d{2})-(\d{2})"  # Need to remember indices

4. Handle Match Failures

match = re.search(pattern, text)
if match:
    # Process match result
    result = match.group(1)
else:
    # Handle no match case
    print("No match found")

5. Use Appropriate Functions

Only need to check if it matches: re.search() or re.match()
Need all matches: re.findall()
Need position info: re.finditer()
Need to replace: re.sub()
Need to split: re.split()

Common Pitfalls

1. Greedy vs Non-greedy

text = "<div>content1</div><div>content2</div>"

# Greedy match (default)
greedy = re.search(r'<div>.*</div>', text)
print(greedy.group())  # Matches entire string

# Non-greedy match
lazy = re.search(r'<div>.*?</div>', text)
print(lazy.group())  # Only matches first <div>

2. Escape Special Characters

# Characters to escape: . ^ $ * + ? { } [ ] \ | ( )
pattern = r"\.com"   # Match literal .com
pattern = r"\$"      # Match literal $

3. Chinese Character Handling

# Match Chinese characters
text = "你好世界123"
pattern = r"[\u4e00-\u9fa5]+"  # Match Chinese characters
chinese = re.findall(pattern, text)
print(chinese)  # Output: ['你好世界']

# Note: In Python 3, strings support Unicode by default

Performance Tips

Compile frequently used patterns: Use re.compile() for pre-compilation
Avoid overusing .*: Use more specific patterns
Use non-greedy matching: .*? instead of .*
Avoid unnecessary grouping: Use (?:...) for non-capturing groups
Use character classes: [abc] is faster than multiple | operators

Summary

Python's re module is powerful and easy to use. After mastering these techniques, you can:

Efficiently process text data
Validate user input
Extract key information
Batch modify text
Analyze log files

Remember: Practice is the best teacher. Write more code, try different patterns, and you'll soon become a regex expert!

Use our online Regex Tester to practice and test your regex patterns with immediate results!

Why Choose Python's re Module?

re Module Basics

Importing the Module

Core Functions Overview

Detailed Function Usage

1. re.match() - Match from Beginning

2. re.search() - Search First Match

3. re.findall() - Find All Matches

4. re.finditer() - Get Detailed Match Info

5. re.sub() - Powerful Replacement

6. re.split() - Flexible String Splitting

7. re.compile() - Improve Performance

Regex Pattern Details

Basic Patterns

Boundary Matching

Grouping and Capturing

Practical Examples

Example 1: Validate Email Address

Example 2: Extract Web Links

Example 3: Clean Text

Example 4: Log Analysis

Example 5: Data Extraction

Best Practices

1. Use Raw Strings

2. Compile Common Patterns

3. Use Named Groups

4. Handle Match Failures

5. Use Appropriate Functions

Common Pitfalls

1. Greedy vs Non-greedy

2. Escape Special Characters

3. Chinese Character Handling

Performance Tips

Summary

About the Author

Try It: Regex Tester

Related Articles

C# (.NET) Regular Expressions Classic Cases

Golang Regex: regexp Package Best Practices

Java Regular Expressions: Pattern and Matcher Advanced Usage

JavaScript Regex Methods: test vs match vs exec

Why Choose Python's re Module?

re Module Basics

Importing the Module

Core Functions Overview

Detailed Function Usage

1. re.match() - Match from Beginning

2. re.search() - Search First Match

3. re.findall() - Find All Matches

4. re.finditer() - Get Detailed Match Info

5. re.sub() - Powerful Replacement

6. re.split() - Flexible String Splitting

7. re.compile() - Improve Performance

Regex Pattern Details

Basic Patterns

Boundary Matching

Grouping and Capturing

Practical Examples

Example 1: Validate Email Address

Example 2: Extract Web Links

Example 3: Clean Text

Example 4: Log Analysis

Example 5: Data Extraction

Best Practices

1. Use Raw Strings

2. Compile Common Patterns

3. Use Named Groups

4. Handle Match Failures

5. Use Appropriate Functions

Common Pitfalls

1. Greedy vs Non-greedy

2. Escape Special Characters

3. Chinese Character Handling

Performance Tips

Summary

About the Author

Try It: Regex Tester

Related Articles

C# (.NET) Regular Expressions Classic Cases

Golang Regex: regexp Package Best Practices