Understanding ReDoS¶
Regular Expression Denial of Service (ReDoS) is a type of algorithmic complexity attack that exploits the worst-case behavior of regex engines. This guide explains how ReDoS works and why it's a serious security concern.
What is ReDoS?¶
Most regex engines use backtracking to match patterns. While this approach is flexible, it can lead to exponential or polynomial time complexity for certain patterns and inputs.
The Impact
A single malicious input to a vulnerable regex can:
- Hang your application for minutes or hours
- Consume 100% CPU
- Block other requests
- Cause complete service denial
How Backtracking Works¶
Consider the regex ^(a+)+$ matching against "aaaaX":
- Engine tries to match
a+as many times as possible - Outer
+tries to repeat the group - When
Xis reached, no match - Engine backtracks to try different combinations
For n characters, the engine may try 2^n combinations!
Input: "aaaaX" (4 a's + X)
Attempt 1: (aaaa) - fails at X
Attempt 2: (aaa)(a) - fails at X
Attempt 3: (aa)(aa) - fails at X
Attempt 4: (aa)(a)(a) - fails at X
Attempt 5: (a)(aaa) - fails at X
... and so on for 2^4 = 16 combinations
With 30 characters, that's over 1 billion combinations!
Time Complexity Classes¶
Linear O(n) - Safe ✅¶
Each character is processed once:
# Safe patterns
r"^[a-z]+$" # Simple character class
r"^\d{1,10}$" # Bounded quantifier
r"^[A-Z][a-z]*$" # No nested quantifiers
Polynomial O(n²) or O(n³) - Dangerous ⚠️¶
Processing time grows polynomially with input length:
Exponential O(2^n) - Critical 🚨¶
Processing time doubles with each additional character:
# Exponential patterns
r"^(a+)+$" # Nested quantifiers
r"(a|a)*$" # Overlapping alternatives
r"([a-z]+)+$" # Nested + quantifiers
Real-World Examples¶
CVE-2016-1000232 - Swagger UI¶
Pattern: ^[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
While this specific pattern isn't vulnerable, similar email/domain validators often are:
Node.js Regular Expression Bug¶
The ms package had a vulnerable pattern that could hang with crafted input:
Stack Overflow Outage (2016)¶
A regex in the post formatting caused a 34-minute outage:
Attack Anatomy¶
A ReDoS attack string typically has three parts:
| Component | Purpose |
|---|---|
| Prefix | Sets up the vulnerable state |
| Pump | Repeated to increase backtracking |
| Suffix | Forces the backtrack (usually fails to match) |
Example¶
Pattern: ^(a+)+$
| Component | Value | Purpose |
|---|---|---|
| Prefix | "" or "a" | Initial match |
| Pump | "a" | Repeated many times |
| Suffix | "!" | Causes match failure |
Attack string: "a" × 30 + "!" = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!"
Detection Methods¶
Static Analysis (Automaton-based)¶
Builds a finite automaton from the regex and detects ambiguity:
Pros: Fast, deterministic Cons: Can't handle backreferences
Dynamic Analysis (Fuzzing)¶
Executes patterns with crafted inputs and measures step count:
Pros: Handles all regex features Cons: Slower, may miss edge cases
ReDoctor's Hybrid Approach¶
ReDoctor combines both methods:
- Automaton checker for patterns without backreferences
- Fuzz checker for complex patterns
- Recall validation to confirm findings
Mitigation Strategies¶
1. Use Safe Patterns¶
Avoid nested quantifiers and overlapping alternatives:
2. Use Atomic Groups¶
If your regex engine supports them:
3. Use Possessive Quantifiers¶
4. Set Timeouts¶
Python 3.11+ supports regex timeouts:
5. Use ReDoctor!¶
Check patterns before deployment:
from redoctor import check
result = check(pattern)
if result.is_vulnerable:
raise ValueError(f"Vulnerable pattern: {result.complexity}")
Prevention Checklist¶
Before Using a Regex
- Run through ReDoctor
- Avoid nested quantifiers (
(a+)+) - Avoid overlapping alternatives (
(a|a)) - Prefer specific character classes over
. - Use bounded quantifiers when possible (
{1,10}vs+) - Test with edge cases
- Set timeouts for user-supplied patterns