Safe Patterns¶
This guide provides best practices for writing regex patterns that are safe from ReDoS attacks.
Golden Rules¶
Safe Regex Guidelines
- Avoid nested quantifiers - Never nest
+,*, or{n,m}inside each other - Avoid overlapping alternatives - Each alternative should match distinct input
- Use specific character classes - Prefer
[a-z]over. - Bound your quantifiers - Use
{1,100}instead of+ - Test with ReDoctor - Always validate before deployment
Safe Pattern Examples¶
Simple Character Classes¶
Linear time complexity - each character matched once:
# ✅ Safe patterns
r"^[a-zA-Z]+$" # Letters only
r"^[0-9]+$" # Digits only
r"^[a-zA-Z0-9_]+$" # Alphanumeric with underscore
r"^[\w]+$" # Word characters
Bounded Quantifiers¶
Limit repetition to prevent excessive backtracking:
# ✅ Safe patterns
r"^\d{1,10}$" # 1-10 digits
r"^[a-z]{2,50}$" # 2-50 lowercase letters
r"^.{1,1000}$" # Limited length
Specific Patterns¶
Well-defined patterns without ambiguity:
# ✅ Safe patterns
r"^\d{4}-\d{2}-\d{2}$" # Date: YYYY-MM-DD
r"^\d{3}-\d{3}-\d{4}$" # Phone: XXX-XXX-XXXX
r"^[A-Z]{2}\d{6}$" # ID: XX000000
Non-overlapping Alternatives¶
Each alternative matches distinct input:
# ✅ Safe patterns
r"^(cat|dog|bird)$" # Distinct words
r"^(yes|no|maybe)$" # No overlap
r"^(\d+|[a-z]+)$" # Numbers OR letters (not mixed)
Anchored Patterns¶
Start and end anchors reduce backtracking:
# ✅ Safe patterns
r"^exact$" # Exact match
r"^prefix.*" # Anchored start
r".*suffix$" # Anchored end (careful with .*)
Pattern Transformations¶
From Dangerous to Safe¶
| Vulnerable | Safe | Change |
|---|---|---|
(a+)+ | a+ | Remove nesting |
(a\|a)* | a* | Remove overlap |
.*a.* | [^a]*a.* | Use negated class |
(a+)+b | a+b | Flatten |
(\d+\.)+ | (\d+\.)*\d+ | Be explicit |
Email Validation¶
# ❌ Dangerous
r"^([a-zA-Z0-9]+)*@"
# ✅ Safe
r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
# ✅ Better: Use a library
import email.utils
email.utils.parseaddr(email_string)
URL Validation¶
# ❌ Dangerous - multiple .*
r"^https?://.*\..*\..*$"
# ✅ Safe - specific structure
r"^https?://[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)+(/.*)?$"
# ✅ Better: Use urllib
from urllib.parse import urlparse
result = urlparse(url)
Number Validation¶
# ❌ Potentially dangerous with edge cases
r"^-?\d*\.?\d*$"
# ✅ Safe and clear
r"^-?\d+(\.\d+)?$"
# ✅ Better: Use Python
try:
float(value)
except ValueError:
pass
Validation Checklist¶
Before using any regex in production:
from redoctor import check, is_safe
def validate_regex(pattern: str) -> bool:
"""Validate a regex pattern is safe to use."""
result = check(pattern)
if result.is_vulnerable:
print(f"❌ Vulnerable: {pattern}")
print(f" Complexity: {result.complexity}")
return False
if result.status.value == "unknown":
print(f"⚠️ Unknown: {pattern}")
print(f" Consider manual review")
return True # Proceed with caution
print(f"✅ Safe: {pattern}")
return True
# Use it
patterns = [
r"^[a-z]+$",
r"^(a+)+$",
r"^\d{1,10}$",
]
for p in patterns:
validate_regex(p)
Safe Pattern Templates¶
Identifier (username, variable name)¶
UUID¶
ISO Date¶
IP Address (v4)¶
Hex Color¶
Semantic Version¶
When to Avoid Regex¶
Sometimes regex isn't the right tool:
Parsing HTML/XML¶
# ❌ Don't do this
html_pattern = r"<(\w+)[^>]*>.*?</\1>"
# ✅ Use a parser
from html.parser import HTMLParser
from bs4 import BeautifulSoup
Complex Validation¶
# ❌ Complex regex for email
# ✅ Use email-validator library
from email_validator import validate_email
JSON/YAML/TOML¶
# ❌ Parsing structured data with regex
# ✅ Use proper parsers
import json
import yaml
import tomllib
Mathematical Expressions¶
Testing Your Patterns¶
Always test with edge cases:
from redoctor import check
def test_pattern(pattern: str, test_cases: list):
"""Test a pattern for safety and correctness."""
import re
# Check safety
result = check(pattern)
print(f"Pattern: {pattern}")
print(f"Safety: {'✅ Safe' if result.is_safe else '❌ Vulnerable'}")
# Test functionality
regex = re.compile(pattern)
for test_input, should_match in test_cases:
matches = bool(regex.match(test_input))
status = "✓" if matches == should_match else "✗"
print(f" {status} {test_input!r}: {matches}")
# Example
test_pattern(r"^[a-z]{3,10}$", [
("hello", True),
("Hi", False), # Capital letter
("ab", False), # Too short
("", False), # Empty
("verylongword", False), # Too long
])