Java Regular Expressions: Pattern and Matcher Advanced Usage
Master Java's Pattern and Matcher classes, learn advanced regex techniques and best practices.
Java provides powerful regular expression support through the Pattern and Matcher classes. This guide will take you from basics to advanced, covering everything you need to know about Java regular expressions.
Java Regex Basics
Core Classes Introduction
Java's regex functionality is provided by the java.util.regex package, containing two main classes:
- Pattern: Represents a compiled regular expression pattern
- Matcher: Uses the pattern to perform matching operations on input strings
Basic Usage Flow
import java.util.regex.Pattern;
import java.util.regex.Matcher;
// 1. Compile regular expression
Pattern pattern = Pattern.compile("\\d+");
// 2. Create Matcher object
Matcher matcher = pattern.matcher("Hello 123 World");
// 3. Perform matching operations
if (matcher.find()) {
System.out.println("Match found: " + matcher.group());
}
Pattern Class Details
1. Creating Pattern Objects
Basic Compilation
// Simple pattern
Pattern pattern = Pattern.compile("abc");
// Using flags
Pattern caseInsensitive = Pattern.compile("abc", Pattern.CASE_INSENSITIVE);
// Multi-line mode
Pattern multiLine = Pattern.compile("^test$", Pattern.MULTILINE);
// Combine multiple flags
Pattern combined = Pattern.compile(
"test",
Pattern.CASE_INSENSITIVE | Pattern.MULTILINE
);
Pattern Flags Explained
// CASE_INSENSITIVE - Ignore case
Pattern p1 = Pattern.compile("hello", Pattern.CASE_INSENSITIVE);
// Matches: hello, HELLO, Hello, etc.
// MULTILINE - Multi-line mode
Pattern p2 = Pattern.compile("^test$", Pattern.MULTILINE);
// Can match at the beginning and end of each line
// DOTALL - Dot matches newline
Pattern p3 = Pattern.compile(".*", Pattern.DOTALL);
// . can match newline characters
// UNICODE_CASE - Unicode case
Pattern p4 = Pattern.compile("äbc", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
// CANON_EQ - Canonical equivalence
Pattern p5 = Pattern.compile("a\u030A", Pattern.CANON_EQ);
// Matches "å" (regardless of encoding)
Pre-defined Patterns
// Java provides some common pre-defined patterns
// Check for integer
boolean isInteger = Pattern.matches("-?\\d+", "123");
// Check for email
boolean isEmail = Pattern.matches(
"[\\w.-]+@[\\w.-]+\\.[a-z]{2,}",
"[email protected]"
);
// Check for phone number (simple version)
boolean isPhone = Pattern.matches("\\d{11}", "13812345678");
2. Pattern Common Methods
split() - Split String
String text = "apple,banana;orange|grape";
Pattern pattern = Pattern.compile("[,;|]");
String[] fruits = pattern.split(text);
// Output: ["apple", "banana", "orange", "grape"]
// Limit split count
String[] parts = pattern.split(text, 2);
// Output: ["apple", "banana;orange|grape"]
// Keep whitespace
String[] all = pattern.split(text, -1);
quote() - Escape Special Characters
String special = "a.b*c+d?e";
String escaped = Pattern.quote(special);
System.out.println(escaped);
// Output: \Qa.b*c+d?e\E
// Create literal matching pattern
Pattern literal = Pattern.compile(Pattern.quote("1.2"));
// Only matches literal string "1.2", not any number
Matcher Class Details
1. Creating Matcher Objects
Pattern pattern = Pattern.compile("\\d+");
String text = "Order #123, #456, #789";
Matcher matcher = pattern.matcher(text);
2. Matching Methods
matches() - Full Match
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher("123");
if (matcher.matches()) {
System.out.println("Entire string is digits");
}
// No match case
matcher = pattern.matcher("abc123");
System.out.println(matcher.matches()); // false
lookingAt() - Match from Beginning
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher("123abc");
if (matcher.lookingAt()) {
System.out.println("Matched from beginning");
}
// Difference with matches()
System.out.println(pattern.matcher("123abc").matches()); // false
System.out.println(pattern.matcher("123abc").lookingAt()); // true
find() - Search for Match
Pattern pattern = Pattern.compile("\\d+");
String text = "a1b2c3d4";
Matcher matcher = pattern.matcher(text);
// Find all matches
while (matcher.find()) {
System.out.printf("Found: %s, Position: %d%n",
matcher.group(), matcher.start());
}
// Output:
// Found: 1, Position: 1
// Found: 2, Position: 3
// Found: 3, Position: 5
// Found: 4, Position: 7
3. Getting Match Information
group() - Get Match Content
Pattern pattern = Pattern.compile("(\\d{4})-(\\d{2})-(\\d{2})");
Matcher matcher = pattern.matcher("Date: 2024-01-25");
if (matcher.find()) {
// Full match
System.out.println(matcher.group()); // 2024-01-25
// Capture groups
System.out.println(matcher.group(1)); // 2024
System.out.println(matcher.group(2)); // 01
System.out.println(matcher.group(3)); // 25
// Group count
System.out.println(matcher.groupCount()); // 3
}
start() and end() - Get Position
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher("abc123def");
if (matcher.find()) {
System.out.println("Start: " + matcher.start()); // 3
System.out.println("End: " + matcher.end()); // 6
System.out.println("Length: " + matcher.group().length()); // 3
}
start(int) and end(int) - Get Group Position
Pattern pattern = Pattern.compile("(\\w+)@(\\w+\\.\\w+)");
Matcher matcher = pattern.matcher("[email protected]");
if (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
System.out.println("Username position: " + matcher.start(1) + "-" + matcher.end(1));
System.out.println("Domain position: " + matcher.start(2) + "-" + matcher.end(2));
}
4. Replacement Methods
replaceAll() - Replace All Matches
Pattern pattern = Pattern.compile("\\d+");
String text = "Price: 100, 200, 300";
String result = pattern.matcher(text).replaceAll("[number]");
System.out.println(result);
// Output: Price: [number], [number], [number]
// Using callback function (Java 9+)
String result2 = pattern.matcher(text).replaceAll(match -> {
int num = Integer.parseInt(match.group());
return String.valueOf(num * 0.9);
});
System.out.println(result2);
// Output: Price: 90.0, 180.0, 270.0
replaceFirst() - Replace First Match
Pattern pattern = Pattern.compile("\\d+");
String text = "Price: 100, 200, 300";
String result = pattern.matcher(text).replaceFirst("[number]");
System.out.println(result);
// Output: Price: [number], 200, 300
appendReplacement() and appendTail() - Accumulate Replacement
Pattern pattern = Pattern.compile("\\d+");
String text = "Count: 100, Price: 200";
Matcher matcher = pattern.matcher(text);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
int num = Integer.parseInt(matcher.group());
matcher.appendReplacement(sb, String.valueOf(num * 0.9));
}
matcher.appendTail(sb);
System.out.println(sb.toString());
// Output: Count: 90.0, Price: 180.0
Advanced Features
1. Named Capture Groups (Java 7+)
Pattern pattern = Pattern.compile(
"(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})"
);
Matcher matcher = pattern.matcher("2024-01-25");
if (matcher.find()) {
System.out.println(matcher.group("year")); // 2024
System.out.println(matcher.group("month")); // 01
System.out.println(matcher.group("day")); // 25
}
2. Lookahead and Lookbehind
Positive Lookahead
// Match numbers followed by "元"
Pattern pattern = Pattern.compile("\\d+(?=元)");
Matcher matcher = pattern.matcher("Price: 100元, 200元");
while (matcher.find()) {
System.out.println(matcher.group());
}
// Output: 100, 200
Negative Lookahead
// Match numbers not followed by "元"
Pattern pattern = Pattern.compile("\\d+(?!元)");
Matcher matcher = pattern.matcher("Count: 100, Price: 200元");
while (matcher.find()) {
System.out.println(matcher.group());
}
// Output: 100
Positive Lookbehind
// Match numbers preceded by "Price:"
Pattern pattern = Pattern.compile("(?<=Price:)\\d+");
Matcher matcher = pattern.matcher("Price: 100, Count: 200");
while (matcher.find()) {
System.out.println(matcher.group());
}
// Output: 100
3. Boundary Matching
String text = "hello world hello";
// \b - Word boundary
Pattern wordBoundary = Pattern.compile("\\bhello\\b");
Matcher matcher1 = wordBoundary.matcher(text);
while (matcher1.find()) {
System.out.println(matcher1.group());
}
// Output: hello, hello (two standalone words)
// \B - Non-word boundary
Pattern nonWordBoundary = Pattern.compile("\\Bhello\\B");
Matcher matcher2 = nonWordBoundary.matcher(text);
System.out.println(matcher2.find()); // false
4. Quantifiers and Greediness
String text = "<div>content1</div><div>content2</div>";
// Greedy match (default)
Pattern greedy = Pattern.compile("<div>.*</div>");
Matcher matcher1 = greedy.matcher(text);
if (matcher1.find()) {
System.out.println("Greedy: " + matcher1.group());
// Output: <div>content1</div><div>content2</div>
}
// Non-greedy match
Pattern lazy = Pattern.compile("<div>.*?</div>");
Matcher matcher2 = lazy.matcher(text);
if (matcher2.find()) {
System.out.println("Non-greedy: " + matcher2.group());
// Output: <div>content1</div>
}
Practical Examples
Example 1: Validate Email Address
import java.util.regex.Pattern;
public class EmailValidator {
private static final Pattern EMAIL_PATTERN = Pattern.compile(
"^[\\w.-]+@[\\w.-]+\\.[a-zA-Z]{2,}$"
);
public static boolean isValid(String email) {
if (email == null) {
return false;
}
return EMAIL_PATTERN.matcher(email).matches();
}
public static void main(String[] args) {
System.out.println(isValid("[email protected]")); // true
System.out.println(isValid("invalid.email")); // false
System.out.println(isValid("user@domain")); // false
}
}
Example 2: Extract Web Links
import java.util.regex.*;
import java.util.ArrayList;
import java.util.List;
public class LinkExtractor {
private static final Pattern LINK_PATTERN = Pattern.compile(
"href=[\"']([^\"']+)[\"']"
);
public static List<String> extractLinks(String html) {
List<String> links = new ArrayList<>();
Matcher matcher = LINK_PATTERN.matcher(html);
while (matcher.find()) {
links.add(matcher.group(1));
}
return links;
}
public static void main(String[] args) {
String html = """
<a href="https://example.com">Link 1</a>
<a href="http://site.org/page">Link 2</a>
<a href="/relative/path">Link 3</a>
""";
List<String> links = extractLinks(html);
links.forEach(System.out::println);
}
}
Example 3: Log Analysis
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class LogAnalyzer {
private static final Pattern LOG_PATTERN = Pattern.compile(
"(\\d{4}-\\d{2}-\\d{2}) (\\d{2}:\\d{2}:\\d{2}) \\[(\\w+)\\] (.+)"
);
public static void analyzeLog(String log) {
Matcher matcher = LOG_PATTERN.matcher(log);
if (matcher.matches()) {
String date = matcher.group(1);
String time = matcher.group(2);
String level = matcher.group(3);
String message = matcher.group(4);
System.out.printf("Time: %s %s%n", date, time);
System.out.printf("Level: %s%n", level);
System.out.printf("Message: %s%n", message);
}
}
public static void main(String[] args) {
String log = "2024-01-25 10:30:45 [INFO] User login successful";
analyzeLog(log);
}
}
Example 4: Batch Replacement
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class BatchReplacer {
public static String replaceWithPosition(String text, String search) {
Pattern pattern = Pattern.compile(search);
Matcher matcher = pattern.matcher(text);
StringBuffer result = new StringBuffer();
while (matcher.find()) {
String replacement = String.format("[%s@%d]",
matcher.group(), matcher.start());
matcher.appendReplacement(result, Matcher.quoteReplacement(replacement));
}
matcher.appendTail(result);
return result.toString();
}
public static void main(String[] args) {
String text = "apple banana apple";
System.out.println(replaceWithPosition(text, "apple"));
// Output: [apple@0] banana [apple@13]
}
}
Example 5: HTML Cleanup
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HtmlCleaner {
private static final Pattern TAG_PATTERN = Pattern.compile("<[^>]+>");
public static String stripHtml(String html) {
return TAG_PATTERN.matcher(html).replaceAll("");
}
public static void main(String[] args) {
String html = "<p>Hello <b>World</b>!</p>";
System.out.println(stripHtml(html));
// Output: Hello World!
}
}
Best Practices
1. Pre-compile Patterns
// Good practice: Pre-compile
public class EmailValidator {
private static final Pattern EMAIL_PATTERN = Pattern.compile(
"^[\\w.-]+@[\\w.-]+\\.[a-zA-Z]{2,}$"
);
public static boolean isValid(String email) {
return EMAIL_PATTERN.matcher(email).matches();
}
}
// Bad practice: Compile every time
public static boolean isValid(String email) {
Pattern pattern = Pattern.compile("^[\\w.-]+@[\\w.-]+\\.[a-zA-Z]{2,}$");
return pattern.matcher(email).matches();
}
2. Use Static Constants
public class RegexPatterns {
public static final Pattern EMAIL = Pattern.compile(
"^[\\w.-]+@[\\w.-]+\\.[a-zA-Z]{2,}$"
);
public static final Pattern PHONE = Pattern.compile("^\\d{11}$");
public static final Pattern DATE = Pattern.compile(
"^(\\d{4})-(\\d{2})-(\\d{2})$"
);
}
// Use
if (RegexPatterns.EMAIL.matcher(email).matches()) {
// ...
}
3. Handle Null and Empty Values
public static boolean isValid(String input) {
if (input == null || input.isEmpty()) {
return false;
}
return PATTERN.matcher(input).matches();
}
4. Use StringBuilder Instead of StringBuffer (Java 5+)
// Java 5+
StringBuilder sb = new StringBuilder();
while (matcher.find()) {
matcher.appendReplacement(sb, replacement);
}
matcher.appendTail(sb);
5. Exception Handling
try {
Pattern pattern = Pattern.compile("[" + input + "]");
// Use pattern
} catch (PatternSyntaxException e) {
System.err.println("Invalid regex: " + e.getMessage());
}
Performance Optimization
1. Avoid Repeated Compilation
// Good: Compile only once
Pattern pattern = Pattern.compile("\\d+");
for (String text : texts) {
pattern.matcher(text).find();
}
// Bad: Compile every time
for (String text : texts) {
Pattern.compile("\\d+").matcher(text).find();
}
2. Use More Specific Patterns
// Good: Specific pattern
Pattern good = Pattern.compile("\\d{3}-\\d{4}-\\d{4}");
// Bad: Broad pattern
Pattern bad = Pattern.compile(".+");
3. Use Non-Greedy Quantifiers
// Good: Non-greedy
Pattern good = Pattern.compile("<div>.*?</div>");
// Bad: Greedy
Pattern bad = Pattern.compile("<div>.*</div>");
Common Pitfalls
1. Forgetting to Escape Backslashes
// Error: No escape
Pattern p = Pattern.compile("\d+"); // Compilation error or unexpected behavior
// Correct: Escape backslash
Pattern p = Pattern.compile("\\d+");
2. Not Checking matches() Result
Matcher matcher = pattern.matcher(text);
// Error: May not have a match
System.out.println(matcher.group()); // IllegalStateException
// Correct: Check
if (matcher.find()) {
System.out.println(matcher.group());
}
3. Forgetting to Reuse Pattern
// Error: Create new Pattern every time
for (int i = 0; i < 1000; i++) {
Pattern p = Pattern.compile("\\d+");
p.matcher(text).find();
}
// Correct: Reuse Pattern
Pattern p = Pattern.compile("\\d+");
for (int i = 0; i < 1000; i++) {
p.matcher(text).find();
}
Summary
Java's Pattern and Matcher classes provide powerful and flexible regex support:
- Pattern: Compiles regular expressions, provides static methods
- Matcher: Performs matching operations, provides rich query and replacement features
- Advanced features: Named capture groups, lookahead/lookbehind, boundary matching, etc.
- Best practices: Pre-compile, use constants, handle exceptions properly
Mastering these techniques allows you to efficiently handle various text processing tasks, from simple validation to complex parsing.
Use our online Regex Tester to practice Java regex patterns!
About the Author
The Regex Master Team consists of experienced developers and technical writers dedicated to simplifying regular expressions for everyone. We ensure all patterns are rigorously tested and verified to provide accurate, production-ready solutions.
Try It: Regex Tester
Use our interactive regex tester to experiment with the patterns you learned in this article. Test your regular expressions in real-time and see immediate results.