R
Regex Master
TutorialsToolsFAQAboutContact
  1. Home
  2. Tutorials
  3. Extraction
  4. How to Extract All URLs from Text Using Regex
March 20, 2025Regex Master Team10 min read

How to Extract All URLs from Text Using Regex

ExtractionURL extractionregex patternstext parsingweb scraping

Master URL extraction from text using regular expressions with comprehensive patterns for HTTP, HTTPS, FTP, and more.

How to Extract All URLs from Text Using Regex

Extracting URLs from text is a common task for web scraping, log analysis, and content processing. Regular expressions provide an efficient way to identify and extract URLs from unstructured text. In this comprehensive guide, we'll explore various regex patterns for URL extraction, from simple to advanced.

Understanding URL Structure

A URL (Uniform Resource Locator) consists of several components:

https://example.com/path/to/page?query=value#section
│     │          │     │         │              │
│     │          │     │         │              └─ Fragment
│     │          │     │         └──────────────── Query string
│     │          │     └────────────────────────── Path
│     │          └──────────────────────────────── Domain
│     └──────────────────────────────────────────── Protocol

Basic URL Extraction Patterns

Simple HTTP/HTTPS Pattern

https?://[^\s]+

Extracts: https://example.com from "Visit https://example.com for more info"

Breakdown:

  • https?:// - Matches http:// or https://
  • [^\s]+ - One or more non-whitespace characters

Pros: Simple and fast
Cons: Includes trailing punctuation, doesn't validate URL format

Including FTP and Other Protocols

(https?|ftp)://[^\s]+

Extracts: http://site.com, https://secure.com, ftp://files.com

Comprehensive Protocol Support

(https?|ftp|file|mailto|tel):[^\s]+

Extracts: Various URL schemes including mailto: and tel:

Improved URL Extraction

Match Complete URLs

https?:\/\/(?:www\.)?[a-zA-Z0-9-]+\.[a-zA-Z]{2,}(?:\/[^\s]*)?

Breakdown:

  • https?:\/\/ - http:// or https://
  • (?:www\.)? - Optional www. prefix
  • [a-zA-Z0-9-]+ - Domain name (letters, digits, hyphens)
  • \. - Literal dot
  • [a-zA-Z]{2,} - Top-level domain (2+ letters)
  • (?:\/[^\s]*)? - Optional path and query string

Valid URLs:

  • https://example.com
  • http://www.example.com
  • https://example.com/path/to/page
  • http://example.com?query=value

Advanced URL Pattern with Query Parameters

https?:\/\/(?:www\.)?[a-zA-Z0-9-]+\.[a-zA-Z]{2,}(?:\/[^\s?#]*)?(?:\?[^\s#]*)?(?:#[^\s]*)?

This pattern handles:

  • Path components
  • Query strings (?key=value)
  • Fragments (#section)

Extracts: https://example.com/path?query=value#section from full text

URL Patterns with Character Validation

Strict URL Validation

https?:\/\/(?:www\.)?[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\.[a-zA-Z]{2,}(?:\/[^\s]*)?

This pattern:

  • Validates domain name rules (max 63 characters per label)
  • Ensures domains don't start or end with hyphens
  • Supports subdomains (e.g., sub.example.com)

URL with Port Number

https?:\/\/(?:www\.)?[a-zA-Z0-9-]+\.[a-zA-Z]{2,}(?::\d{1,5})?(?:\/[^\s]*)?

Extracts: https://example.com:8080/path

Breakdown:

  • (?::\d{1,5})? - Optional port number (1-5 digits)

URL with IP Address

https?:\/\/(?:www\.)?(?:\d{1,3}\.){3}\d{1,3}(?::\d{1,5})?(?:\/[^\s]*)?

Extracts: https://192.168.1.1:8080/path

Special URL Patterns

URLs with Authentication

https?:\/\/[^:\s]+:[^@\s]+@[^\s]+

Extracts: https://user:[email protected]

Relative URLs

\/[^\s?#]+(?:\?[^\s#]*)?(?:#[^\s]*)?

Extracts: /path/to/page and /path?query=value (without domain)

Data URLs

data:[^,\s]+,[^,\s]+

Extracts: data:text/plain;base64,SGVsbG8=

Extracting URLs from Complex Text

Extract All URLs (Multiple Types)

(?:https?|ftp):\/\/(?:www\.)?[a-zA-Z0-9-]+\.[a-zA-Z]{2,}(?:\/[^\s]*)?|(?:mailto|tel):[^\s]+

This pattern extracts:

  • HTTP/HTTPS URLs
  • FTP URLs
  • mailto: links
  • tel: links

URL Extraction with Punctuation Handling

https?:\/\/(?:www\.)?[a-zA-Z0-9-]+\.[a-zA-Z]{2,}(?:\/[^\s\)\]\}>"]*)?

Stops extraction at common closing punctuation: ), ], }, >, "

Example: "Visit https://example.com) now!" → Extracts https://example.com

Code Examples

JavaScript URL Extraction

function extractUrls(text) {
  const urlRegex = /https?:\/\/(?:www\.)?[a-zA-Z0-9-]+\.[a-zA-Z]{2,}(?:\/[^\s]*)?/g;
  const urls = text.match(urlRegex);
  return urls || [];
}

const text = "Visit https://example.com and http://www.test.com for more info";
const urls = extractUrls(text);
console.log(urls);
// Output: ["https://example.com", "http://www.test.com"]

Python URL Extraction

import re

def extract_urls(text):
    url_pattern = r'https?://(?:www\.)?[a-zA-Z0-9-]+\.[a-zA-Z]{2,}(?:/[^\s]*)?'
    return re.findall(url_pattern, text)

text = "Visit https://example.com and http://www.test.com"
urls = extract_urls(text)
print(urls)
# Output: ['https://example.com', 'http://www.test.com']

URL Extraction with Validation

function extractAndValidateUrls(text) {
  const urlRegex = /https?:\/\/(?:www\.)?[a-zA-Z0-9-]+\.[a-zA-Z]{2,}(?:\/[^\s]*)?/g;
  const urls = text.match(urlRegex) || [];
  
  // Validate each URL
  return urls.filter(url => {
    try {
      new URL(url);
      return true;
    } catch {
      return false;
    }
  });
}

Advanced Use Cases

Extract URLs from HTML

href=["'](https?:[^"']+)["']

Extracts: https://example.com from <a href="https://example.com">Link</a>

Extract URLs from Markdown

\[([^\]]+)\]\((https?:[^)]+)\)

Captures:

  • Group 1: Link text
  • Group 2: URL

Extracts from: [Link](https://example.com) → URL: https://example.com, Text: Link

Extract URLs from Social Media Posts

https?:\/\/(?:www\.)?(?:twitter|facebook|instagram|linkedin)\.com\/[^\s]+

Extracts: https://twitter.com/user/status/123456789

Best Practices

1. Use the Global Flag

// WRONG: Only finds first URL
const urls = text.match(/https?:\/\/[^\s]+/);

// RIGHT: Finds all URLs
const urls = text.match(/https?:\/\/[^\s]+/g);

2. Validate After Extraction

// Extract with regex
const urls = text.match(urlRegex);

// Validate with URL constructor
urls.forEach(url => {
  try {
    const parsed = new URL(url);
    console.log('Valid:', parsed.hostname);
  } catch (e) {
    console.log('Invalid:', url);
  }
});

3. Handle Trailing Punctuation

// Clean up trailing punctuation
function cleanUrl(url) {
  return url.replace(/[.,;!?]+$/, '');
}

const urls = text.match(urlRegex).map(cleanUrl);

4. Consider Performance

// Simple patterns are faster for large texts
const simpleRegex = /https?:\/\/[^\s]+/g;

// Complex patterns are more accurate but slower
const complexRegex = /https?:\/\/(?:www\.)?[a-zA-Z0-9-]+\.[a-zA-Z]{2,}(?:\/[^\s]*)?/g;

Common Pitfalls

Pitfall 1: Including Trailing Punctuation

// BAD: Includes trailing period
const url = "Visit https://example.com.";
// Extracts: "https://example.com."

// GOOD: Stops at punctuation
const url = "Visit https://example.com.";
// Extracts: "https://example.com"

Pitfall 2: Not Handling Subdomains

// BAD: Only matches example.com
https?:\/\/[a-zA-Z0-9-]+\.[a-zA-Z]{2,}

// GOOD: Matches sub.example.com
https?:\/\/(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}

Pitfall 3: Missing Protocol

// BAD: Requires protocol
https?:\/\/[^\s]+

// GOOD: Matches URLs with or without protocol
(?:https?:\/\/)?[a-zA-Z0-9-]+\.[a-zA-Z]{2,}(?:\/[^\s]*)?

Testing Your URL Extraction

Use our interactive Match Finder with these test cases:

Text with URLs:

Visit https://example.com and http://www.test.com/path
for more info. Also check ftp://files.com

Expected Extracted URLs:

  • https://example.com
  • http://www.test.com/path
  • ftp://files.com

Edge Cases:

  • URL with port: https://example.com:8080
  • URL with query: https://example.com?query=value
  • URL with fragment: https://example.com#section
  • URL with auth: https://user:[email protected]

Conclusion

URL extraction with regex is about finding the right balance between simplicity and accuracy. The pattern https?:\/\/(?:www\.)?[a-zA-Z0-9-]+\.[a-zA-Z]{2,}(?:\/[^\s]*)? provides a good balance for most applications.

Remember to:

  • Use the global flag to find all URLs
  • Validate URLs after extraction
  • Handle trailing punctuation
  • Consider your specific use case (web scraping, log analysis, etc.)

For complex URL validation, consider using a dedicated URL parsing library in combination with regex for extraction.

Experiment with different patterns using our Regex Tester to find the perfect fit for your URL extraction needs!


About the Author

The Regex Master Team consists of experienced developers and technical writers dedicated to simplifying regular expressions for everyone. We ensure all patterns are rigorously tested and verified to provide accurate, production-ready solutions.

Try It: Regex Tester

Use our interactive regex tester to experiment with the patterns you learned in this article. Test your regular expressions in real-time and see immediate results.

Loading tester...
R
Regex Master

Your comprehensive guide to mastering regular expressions through tutorials and tools.

Company

  • About Us
  • Contact
  • FAQ

Resources

  • All Articles
  • Popular Tools
  • Sitemap

Legal

  • Privacy Policy
  • Terms of Service
  • Cookie Policy
  • Disclaimer

© 2026 Regex Master. All rights reserved.