R
Regex Master
TutorialsToolsFAQAboutContact
  1. Home
  2. Tutorials
  3. Programming
  4. Python Regex Complete Guide: re Module Usage
January 25, 2024Regex Master Team12 min read

Python Regex Complete Guide: re Module Usage

Programmingpythonregexre moduleprogramming

Master Python's re module from basics to advanced with practical examples and best practices.

Python's re module is a powerful tool for working with regular expressions. Whether you're doing data cleaning, text analysis, or form validation, mastering the re module will significantly boost your productivity. This guide will take you from zero to hero, covering everything you need to know about Python regular expressions.

Why Choose Python's re Module?

Python's re module offers complete regular expression support with several advantages:

  • Built-in module, no installation required
  • Clean, intuitive syntax
  • Excellent performance for large text processing
  • Rich function library for various needs
  • Seamless integration with other Python modules

re Module Basics

Importing the Module

Import the module before use:

import re

Core Functions Overview

The re module provides multiple functions, each with specific purposes:

  • re.match() - Match from the beginning of the string
  • re.search() - Search for the first match anywhere in the string
  • re.findall() - Find all matching occurrences
  • re.finditer() - Return an iterator of all matches
  • re.sub() - Replace matching text
  • re.split() - Split string by pattern
  • re.compile() - Compile regex pattern (improves performance)

Detailed Function Usage

1. re.match() - Match from Beginning

match() only checks if the pattern matches the beginning of the string:

import re

text = "Hello, World!"
pattern = r"Hello"

result = re.match(pattern, text)
if result:
    print("Match found:", result.group())  # Output: Hello
else:
    print("Match failed")

# No match case
result = re.match(r"World", text)  # Returns None

Use case: Validate user input against specific formats, such as email or phone numbers.

2. re.search() - Search First Match

search() looks for the first match anywhere in the string:

text = "Python is awesome! Python is powerful!"
pattern = r"Python"

result = re.search(pattern, text)
if result:
    print("Found:", result.group())  # Output: Python
    print("Position:", result.start())  # Output: 0

Use case: Extract key information from log files, find specific errors or warnings.

3. re.findall() - Find All Matches

findall() returns a list of all matching occurrences:

text = "My phone: 138-1234-5678, yours: 139-8765-4321"
pattern = r"\d{3}-\d{4}-\d{4}"

phone_numbers = re.findall(pattern, text)
print(phone_numbers)  # Output: ['138-1234-5678', '139-8765-4321']

Use case: Batch extract data, such as all emails, links, or image URLs from a webpage.

4. re.finditer() - Get Detailed Match Info

finditer() returns an iterator of match objects with more information:

text = "Email: [email protected], [email protected], [email protected]"
pattern = r"[\w.+-]+@[\w-]+\.[\w.-]+"

for match in re.finditer(pattern, text):
    print(f"Email: {match.group()}, Start: {match.start()}, End: {match.end()}")

Use case: When you need to know the exact position of each match.

5. re.sub() - Powerful Replacement

sub() can replace all matching occurrences:

# Simple replacement
text = "Hello, Hello, Hello"
result = re.sub(r"Hello", "Hi", text)
print(result)  # Output: Hi, Hi, Hi

# Using callback function
text = "Price: 100, 200, 300"
def discount(match):
    price = int(match.group())
    return f"{price * 0.9}元"

result = re.sub(r"\d+", discount, text)
print(result)  # Output: Price: 90.0元, 180.0元, 270.0元

# Limit replacement count
text = "a-a-a-a"
result = re.sub(r"a", "b", text, count=2)
print(result)  # Output: b-b-a-a

Use case: Batch modify text format, such as unifying date formats or cleaning special characters.

6. re.split() - Flexible String Splitting

split() splits a string based on regex pattern:

# Split by multiple delimiters
text = "apple,banana;orange|grape"
result = re.split(r"[,;|]", text)
print(result)  # Output: ['apple', 'banana', 'orange', 'grape']

# Keep delimiters
text = "apple  banana  orange"
result = re.split(r"(\s+)", text)
print(result)  # Output: ['apple', '  ', 'banana', '  ', 'orange']

# Limit split count
text = "one,two,three,four"
result = re.split(r",", text, maxsplit=2)
print(result)  # Output: ['one', 'two', 'three,four']

Use case: Parse complex text formats, like log files or configuration files.

7. re.compile() - Improve Performance

If using the same pattern multiple times, compile it first for better performance:

# Not compiled (re-parses pattern every time)
pattern = r"\b\w+\b"
text = "This is a test"

for _ in range(1000):
    words = re.findall(pattern, text)

# Compiled (only parses once)
compiled_pattern = re.compile(r"\b\w+\b")
for _ in range(1000):
    words = compiled_pattern.findall(text)  # Faster!

Use case: When using the same regex pattern in loops or frequent calls.

Regex Pattern Details

Basic Patterns

# Character classes
pattern = r"[a-z]"          # Match any lowercase letter
pattern = r"[A-Z0-9]"       # Match uppercase letter or digit
pattern = r"[^0-9]"         # Match non-digit

# Predefined character classes
pattern = r"\d"             # Digit: [0-9]
pattern = r"\D"             # Non-digit: [^0-9]
pattern = r"\w"             # Alphanumeric: [a-zA-Z0-9_]
pattern = r"\W"             # Non-alphanumeric: [^a-zA-Z0-9_]
pattern = r"\s"             # Whitespace character
pattern = r"\S"             # Non-whitespace character

# Quantifiers
pattern = r"a*"             # 0 or more times
pattern = r"a+"             # 1 or more times
pattern = r"a?"             # 0 or 1 time
pattern = r"a{3}"           # Exactly 3 times
pattern = r"a{2,5}"         # 2 to 5 times
pattern = r"a{2,}"          # At least 2 times

Boundary Matching

text = "hello world hello"

# ^ Match string start
re.search(r"^hello", text)   # Matches first hello

# $ Match string end
re.search(r"hello$", text)   # Matches last hello

# \b Match word boundary
re.findall(r"\bhello\b", text)  # Only matches standalone hello

Grouping and Capturing

# Capture groups
text = "My birthday: 1990-05-15"
pattern = r"(\d{4})-(\d{2})-(\d{2})"
match = re.search(pattern, text)
if match:
    year = match.group(1)   # 1990
    month = match.group(2)  # 05
    day = match.group(3)    # 15

# Named groups
text = "Name: 张三, Age: 25"
pattern = r"Name: (?P<name>\w+), Age: (?P<age>\d+)"
match = re.search(pattern, text)
if match:
    print(match.group('name'))  # 张三
    print(match.group('age'))   # 25

# Non-capturing groups
pattern = r"(?:apple|banana|orange)"  # Group but don't capture

Practical Examples

Example 1: Validate Email Address

import re

def validate_email(email):
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(pattern, email))

# Test
print(validate_email("[email protected]"))      # True
print(validate_email("invalid.email"))          # False
print(validate_email("test@domain"))            # False

Example 2: Extract Web Links

import re

html = """
<a href="https://example.com">Link 1</a>
<a href="http://site.org/page">Link 2</a>
<a href="/relative/path">Link 3</a>
"""

pattern = r'href=["\']([^"\']+)["\']'
links = re.findall(pattern, html)
print(links)
# Output: ['https://example.com', 'http://site.org/page', '/relative/path']

Example 3: Clean Text

import re

text = "This    is   a  very   messy   text!!!!!"
# Remove extra spaces and punctuation
cleaned = re.sub(r'\s+', ' ', text)
cleaned = re.sub(r'!+', '!', cleaned)
print(cleaned)  # Output: This is a very messy text!

Example 4: Log Analysis

import re

log = """
2024-01-25 10:30:45 [INFO] User login successful
2024-01-25 10:31:20 [ERROR] Database connection failed
2024-01-25 10:32:10 [INFO] Data saved
2024-01-25 10:33:05 [WARNING] Memory usage at 85%
"""

# Extract error logs
error_pattern = r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} \[ERROR\] (.+)'
errors = re.findall(error_pattern, log)
print(errors)  # Output: ['Database connection failed']

Example 5: Data Extraction

import re

text = """
Order #1001: Apple x 2 = ¥10.00
Order #1002: Banana x 3 = ¥15.00
Order #1003: Orange x 1 = ¥8.00
"""

pattern = r'Order #(\d+): (\w+) x (\d+) = ¥(\d+\.\d+)'
orders = re.findall(pattern, text)

for order in orders:
    order_id, product, quantity, price = order
    print(f"Order ID: {order_id}, Product: {product}, Quantity: {quantity}, Price: {price}")

Best Practices

1. Use Raw Strings

# Good practice
pattern = r"\d{3}-\d{4}"

# Bad practice
pattern = "\\d{3}-\\d{4}"

2. Compile Common Patterns

# Compile if used multiple times
EMAIL_PATTERN = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')

def is_valid_email(email):
    return bool(EMAIL_PATTERN.match(email))

3. Use Named Groups

# Good practice
pattern = r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"

# Bad practice
pattern = r"(\d{4})-(\d{2})-(\d{2})"  # Need to remember indices

4. Handle Match Failures

match = re.search(pattern, text)
if match:
    # Process match result
    result = match.group(1)
else:
    # Handle no match case
    print("No match found")

5. Use Appropriate Functions

  • Only need to check if it matches: re.search() or re.match()
  • Need all matches: re.findall()
  • Need position info: re.finditer()
  • Need to replace: re.sub()
  • Need to split: re.split()

Common Pitfalls

1. Greedy vs Non-greedy

text = "<div>content1</div><div>content2</div>"

# Greedy match (default)
greedy = re.search(r'<div>.*</div>', text)
print(greedy.group())  # Matches entire string

# Non-greedy match
lazy = re.search(r'<div>.*?</div>', text)
print(lazy.group())  # Only matches first <div>

2. Escape Special Characters

# Characters to escape: . ^ $ * + ? { } [ ] \ | ( )
pattern = r"\.com"   # Match literal .com
pattern = r"\$"      # Match literal $

3. Chinese Character Handling

# Match Chinese characters
text = "你好世界123"
pattern = r"[\u4e00-\u9fa5]+"  # Match Chinese characters
chinese = re.findall(pattern, text)
print(chinese)  # Output: ['你好世界']

# Note: In Python 3, strings support Unicode by default

Performance Tips

  1. Compile frequently used patterns: Use re.compile() for pre-compilation
  2. Avoid overusing .*: Use more specific patterns
  3. Use non-greedy matching: .*? instead of .*
  4. Avoid unnecessary grouping: Use (?:...) for non-capturing groups
  5. Use character classes: [abc] is faster than multiple | operators

Summary

Python's re module is powerful and easy to use. After mastering these techniques, you can:

  • Efficiently process text data
  • Validate user input
  • Extract key information
  • Batch modify text
  • Analyze log files

Remember: Practice is the best teacher. Write more code, try different patterns, and you'll soon become a regex expert!

Use our online Regex Tester to practice and test your regex patterns with immediate results!


About the Author

The Regex Master Team consists of experienced developers and technical writers dedicated to simplifying regular expressions for everyone. We ensure all patterns are rigorously tested and verified to provide accurate, production-ready solutions.

Try It: Regex Tester

Use our interactive regex tester to experiment with the patterns you learned in this article. Test your regular expressions in real-time and see immediate results.

Loading tester...

Related Articles

C# (.NET) Regular Expressions Classic Cases

Deep dive into C# regular expressions, master Regex class advanced usage and practical application cases.

Read Article

Golang Regex: regexp Package Best Practices

Learn Go language's regexp package, master efficient regex usage techniques and best practices.

Read Article

Java Regular Expressions: Pattern and Matcher Advanced Usage

Master Java's Pattern and Matcher classes, learn advanced regex techniques and best practices.

Read Article

JavaScript Regex Methods: test vs match vs exec

Deep dive into JavaScript regex methods: understand the differences between test, match, and exec, and learn when to use each.

Read Article
R
Regex Master

Your comprehensive guide to mastering regular expressions through tutorials and tools.

Company

  • About Us
  • Contact
  • FAQ

Resources

  • All Articles
  • Popular Tools
  • Sitemap

Legal

  • Privacy Policy
  • Terms of Service
  • Cookie Policy
  • Disclaimer

© 2026 Regex Master. All rights reserved.