How ReDoctor Works¶
ReDoctor uses a hybrid analysis approach combining static automata-based analysis with intelligent fuzzing for comprehensive ReDoS detection.
Architecture Overview¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ReDoctor Engine β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Input: Regex Pattern β
β β β
β βΌ β
β βββββββββββββββββββ β
β β Parser β β AST (Abstract Syntax Tree) β
β ββββββββββ¬βββββββββ β
β β β
β βΌ β
β βββββββββββββββββββ β
β β Auto Selector β β Choose checker based on pattern features β
β ββββββββββ¬βββββββββ β
β β β
β βββββββ΄ββββββ β
β βΌ βΌ β
β ββββββββββββ ββββββββββββ β
β β Automatonβ β Fuzz β β
β β Checker β β Checker β β
β ββββββ¬ββββββ ββββββ¬ββββββ β
β β β β
β ββββββββ¬βββββββ β
β βΌ β
β βββββββββββββββββββ β
β β Recall β β Validate with real execution β
β β Validator β β
β ββββββββββ¬βββββββββ β
β β β
β βΌ β
β βββββββββββββββββββ β
β β Diagnostics β β Status, Complexity, Attack String β
β βββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Components¶
1. Parser¶
The parser converts regex patterns into an Abstract Syntax Tree (AST):
from redoctor.parser.parser import parse
# Parse a pattern
pattern = parse(r"^(a+)+$")
# AST structure
# Pattern(
# source="^(a+)+$",
# node=Concat([
# Anchor(START),
# Repeat(Group(Repeat(Char('a'), 1, INF)), 1, INF),
# Anchor(END),
# ])
# )
Supported features:
- Character classes
[a-z] - Quantifiers
+,*,?,{n,m} - Groups
(...),(?:...) - Anchors
^,$,\b - Alternation
a|b - Escape sequences
\d,\w,\s - Unicode categories
\p{L} - Backreferences
\1 - Lookahead
(?=...),(?!...) - Lookbehind
(?<=...),(?<!...)
2. Automaton Checker¶
Uses automata theory to detect vulnerabilities through static analysis.
Ξ΅-NFA Construction¶
Builds an Ξ΅-NFA (Non-deterministic Finite Automaton with epsilon transitions):
Pattern: (a|ab)+
Ξ΅-NFA:
ββββΞ΅ββββ
β βΌ
β (0) βaββ (1) βΞ΅ββ (4) βΞ΅ββ (5)
β β²
ββaββ (2) βbββ (3) βΞ΅ββ
β
βββββΞ΅βββββ
NFAwLA (NFA with Look-Ahead)¶
A key innovation from the recheck project is the NFAwLA - an NFA augmented with look-ahead information to eliminate false positives.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β NFAwLA Construction β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. Build OrderedNFA from Ξ΅-NFA β
β β
β 2. Reverse the NFA and convert to DFA β
β (This DFA tells us what can follow each state) β
β β
β 3. Product: NFA Γ Reversed-DFA β
β States become pairs: (nfa_state, lookahead_state) β
β β
β 4. Prune unreachable transitions β
β (Transitions that can never lead to acceptance) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The look-ahead DFA eliminates "dead-end" ambiguity that would cause false positives.
SCC-Based Ambiguity Detection¶
Uses Strongly Connected Components to detect vulnerability patterns:
# Ambiguity types detected:
#
# 1. EDA (Exponential Degree of Ambiguity)
# - Multiple transitions on same character within an SCC
# - Results in O(2^n) complexity
#
# 2. IDA (Polynomial Degree of Ambiguity)
# - SCC chains with shared characters
# - Results in O(n^k) complexity
The SCC checker uses Tarjan's algorithm to find cycles, then analyzes transitions within each SCC for ambiguity.
Exploitability Check¶
Not all ambiguous patterns are exploitable. The _is_multi_trans_exploitable check considers:
- Pattern anchoring: Does the pattern have
$or\Z? - Continuation requirement: Must something match after the ambiguous part?
- Match mode: How will the regex be used (full vs partial match)?
# Pattern: (a*)*
# - Without $ anchor: can match early β SAFE
# - With $ anchor: must match to end β EXPONENTIAL
# Pattern: ([^@]+)+@
# - The @ forces continuation β EXPONENTIAL
# - Even without $ anchor, backtracking is required
Witness Generation¶
When ambiguity is found, generates a witness (proof of vulnerability):
# Witness for ^(a+)+$
# prefix = "a"
# pump = "a"
# suffix = "!"
# Attack string: prefix + pump * n + suffix
# As n grows, matching time grows exponentially
3. Fuzz Checker¶
Dynamic analysis using a step-counting VM:
VM Architecture¶
βββββββββββββββββββββββββββββββββββββββ
β Regex VM β
βββββββββββββββββββββββββββββββββββββββ€
β Program: Compiled instructions β
β ββββββββββββββββββββββββββββββββββββ
β β 0: CHAR 'a' ββ
β β 1: SPLIT 0, 2 ββ
β β 2: MATCH ββ
β ββββββββββββββββββββββββββββββββββββ
β β
β Interpreter: β
β - Track step count β
β - Detect exponential growth β
β - Generate attack candidates β
βββββββββββββββββββββββββββββββββββββββ
Fuzzing Process¶
# 1. Generate seeds from pattern structure
seeds = seeder.generate(pattern)
# ["a", "aa", "aaa", "!", "a!"]
# 2. Mutate seeds
for seed in seeds:
mutations = mutator.mutate(seed)
# Repeat, insert, delete, swap characters
# 3. Count steps for each mutation
for mutation in mutations:
steps = vm.count_steps(pattern, mutation)
# 4. Check for vulnerability
if steps > threshold:
# Potential vulnerability found!
pass
Step Counting¶
# Linear pattern: steps β n
# Polynomial O(nΒ²): steps β nΒ²
# Exponential O(2^n): steps β 2^n
# Detection thresholds:
LINEAR_THRESHOLD = 100 # Steps per character
POLYNOMIAL_THRESHOLD = 10000
EXPONENTIAL_THRESHOLD = 100000
4. Match Mode (False Positive Control)¶
The match_mode setting controls how patterns are analyzed:
from redoctor.config import MatchMode
# AUTO (default): Infer from pattern anchors
# FULL: Assume full-string matching (conservative)
# PARTIAL: Assume partial matching (fewer false positives)
| Mode | Use Case | Behavior |
|---|---|---|
| AUTO | General use | Checks for $ anchor and continuation |
| FULL | Security audits | All multi-transitions are exploitable |
| PARTIAL | Search-style | Only flagged if anchor/continuation exists |
See Configuration Guide for details.
5. Complexity Analyzer¶
Classifies vulnerability severity:
from redoctor.diagnostics.complexity import Complexity, ComplexityType
# Classification based on step growth:
# - Linear: Safe
# - Polynomial (nΒ², nΒ³): Moderate/High risk
# - Exponential (2^n): Critical risk
# Measurement approach:
# Run with inputs of length 5, 10, 15, 20
# Compare step counts
# Fit to complexity curve
6. Recall Validator¶
Confirms vulnerabilities with real execution:
# Generate attack string
attack = prefix + pump * n + suffix
# Measure actual execution time
# with increasing n values
for n in [10, 20, 30]:
attack = build_attack(n)
time = measure_match_time(pattern, attack)
# If time grows as expected β CONFIRMED
# If time is constant β FALSE POSITIVE
Checker Selection¶
ReDoctor automatically selects the best checker:
def select_checker(pattern: Pattern) -> Checker:
# 1. Check for backreferences
if has_backreferences(pattern):
return FuzzChecker() # Automaton can't handle backrefs
# 2. Check NFA size
nfa = build_nfa(pattern)
if nfa.size > MAX_NFA_SIZE:
return FuzzChecker() # NFA too large
# 3. Default to automaton
return AutomatonChecker()
Performance Characteristics¶
| Aspect | Automaton Checker | Fuzz Checker |
|---|---|---|
| Speed | Very fast (ms) | Slower (100ms-s) |
| Accuracy | High | Good |
| Backreferences | β Not supported | β Supported |
| Lookaround | Partial | β Full |
| Deterministic | β Yes | β No (random seed) |
Diagnostics Output¶
The final output includes:
Diagnostics(
status=Status.VULNERABLE,
source="^(a+)+$",
complexity=Complexity(EXPONENTIAL), # O(2^n)
attack_pattern=AttackPattern(
prefix="a",
pump="aaaaaaaaaaaa",
suffix="!",
),
hotspot=Hotspot(
start=0,
end=8,
pattern="^(a+)+$",
),
checker="automaton",
message="ReDoS vulnerability detected with O(2^n) complexity.",
)
Extending ReDoctor¶
Custom Checkers¶
from redoctor.diagnostics.diagnostics import Diagnostics
class CustomChecker:
def check(self, pattern: str) -> Diagnostics:
# Your custom detection logic
pass
Custom Seeders¶
from redoctor.fuzz.seeder import Seeder
class CustomSeeder(Seeder):
def generate(self, pattern) -> List[FString]:
# Your custom seed generation
pass