Skip to content

How ReDoctor Works

ReDoctor uses a hybrid analysis approach combining static automata-based analysis with intelligent fuzzing for comprehensive ReDoS detection.

Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         ReDoctor Engine                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                      β”‚
β”‚  Input: Regex Pattern                                                β”‚
β”‚           β”‚                                                          β”‚
β”‚           β–Ό                                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                                β”‚
β”‚  β”‚     Parser      β”‚  β†’ AST (Abstract Syntax Tree)                  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                                β”‚
β”‚           β”‚                                                          β”‚
β”‚           β–Ό                                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                                β”‚
β”‚  β”‚  Auto Selector  β”‚  β†’ Choose checker based on pattern features    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                                β”‚
β”‚           β”‚                                                          β”‚
β”‚     β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”                                                   β”‚
β”‚     β–Ό           β–Ό                                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                        β”‚
β”‚  β”‚ Automatonβ”‚  β”‚   Fuzz   β”‚                                        β”‚
β”‚  β”‚ Checker  β”‚  β”‚  Checker β”‚                                        β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                                        β”‚
β”‚       β”‚             β”‚                                               β”‚
β”‚       β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜                                               β”‚
β”‚              β–Ό                                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                                β”‚
β”‚  β”‚    Recall       β”‚  β†’ Validate with real execution                β”‚
β”‚  β”‚   Validator     β”‚                                                β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                                β”‚
β”‚           β”‚                                                          β”‚
β”‚           β–Ό                                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                                β”‚
β”‚  β”‚   Diagnostics   β”‚  β†’ Status, Complexity, Attack String           β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                                β”‚
β”‚                                                                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Components

1. Parser

The parser converts regex patterns into an Abstract Syntax Tree (AST):

from redoctor.parser.parser import parse

# Parse a pattern
pattern = parse(r"^(a+)+$")

# AST structure
# Pattern(
#     source="^(a+)+$",
#     node=Concat([
#         Anchor(START),
#         Repeat(Group(Repeat(Char('a'), 1, INF)), 1, INF),
#         Anchor(END),
#     ])
# )

Supported features:

  • Character classes [a-z]
  • Quantifiers +, *, ?, {n,m}
  • Groups (...), (?:...)
  • Anchors ^, $, \b
  • Alternation a|b
  • Escape sequences \d, \w, \s
  • Unicode categories \p{L}
  • Backreferences \1
  • Lookahead (?=...), (?!...)
  • Lookbehind (?<=...), (?<!...)

2. Automaton Checker

Uses automata theory to detect vulnerabilities through static analysis.

Ξ΅-NFA Construction

Builds an Ξ΅-NFA (Non-deterministic Finite Automaton with epsilon transitions):

Pattern: (a|ab)+

Ξ΅-NFA:
    β”Œβ”€β”€β”€Ξ΅β”€β”€β”€β”
    β”‚       β–Ό
β†’ (0) ─a─→ (1) ─Ρ─→ (4) ─Ρ─→ (5)
    β”‚               β–²
    └─a─→ (2) ─b─→ (3) β”€Ξ΅β”€β”˜
              β”‚
              β””β”€β”€β”€β”€Ξ΅β”€β”€β”€β”€β”˜

NFAwLA (NFA with Look-Ahead)

A key innovation from the recheck project is the NFAwLA - an NFA augmented with look-ahead information to eliminate false positives.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    NFAwLA Construction                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                          β”‚
β”‚  1. Build OrderedNFA from Ξ΅-NFA                         β”‚
β”‚                                                          β”‚
β”‚  2. Reverse the NFA and convert to DFA                  β”‚
β”‚     (This DFA tells us what can follow each state)      β”‚
β”‚                                                          β”‚
β”‚  3. Product: NFA Γ— Reversed-DFA                         β”‚
β”‚     States become pairs: (nfa_state, lookahead_state)   β”‚
β”‚                                                          β”‚
β”‚  4. Prune unreachable transitions                       β”‚
β”‚     (Transitions that can never lead to acceptance)     β”‚
β”‚                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The look-ahead DFA eliminates "dead-end" ambiguity that would cause false positives.

SCC-Based Ambiguity Detection

Uses Strongly Connected Components to detect vulnerability patterns:

# Ambiguity types detected:
#
# 1. EDA (Exponential Degree of Ambiguity)
#    - Multiple transitions on same character within an SCC
#    - Results in O(2^n) complexity
#
# 2. IDA (Polynomial Degree of Ambiguity)
#    - SCC chains with shared characters
#    - Results in O(n^k) complexity

The SCC checker uses Tarjan's algorithm to find cycles, then analyzes transitions within each SCC for ambiguity.

Exploitability Check

Not all ambiguous patterns are exploitable. The _is_multi_trans_exploitable check considers:

  1. Pattern anchoring: Does the pattern have $ or \Z?
  2. Continuation requirement: Must something match after the ambiguous part?
  3. Match mode: How will the regex be used (full vs partial match)?
# Pattern: (a*)*
# - Without $ anchor: can match early β†’ SAFE
# - With $ anchor: must match to end β†’ EXPONENTIAL

# Pattern: ([^@]+)+@
# - The @ forces continuation β†’ EXPONENTIAL
# - Even without $ anchor, backtracking is required

Witness Generation

When ambiguity is found, generates a witness (proof of vulnerability):

# Witness for ^(a+)+$
# prefix = "a"
# pump = "a"
# suffix = "!"

# Attack string: prefix + pump * n + suffix
# As n grows, matching time grows exponentially

3. Fuzz Checker

Dynamic analysis using a step-counting VM:

VM Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           Regex VM                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Program: Compiled instructions      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
β”‚  β”‚ 0: CHAR 'a'                     β”‚β”‚
β”‚  β”‚ 1: SPLIT 0, 2                   β”‚β”‚
β”‚  β”‚ 2: MATCH                        β”‚β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β”‚                                      β”‚
β”‚  Interpreter:                        β”‚
β”‚  - Track step count                  β”‚
β”‚  - Detect exponential growth         β”‚
β”‚  - Generate attack candidates        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Fuzzing Process

# 1. Generate seeds from pattern structure
seeds = seeder.generate(pattern)
# ["a", "aa", "aaa", "!", "a!"]

# 2. Mutate seeds
for seed in seeds:
    mutations = mutator.mutate(seed)
    # Repeat, insert, delete, swap characters

    # 3. Count steps for each mutation
    for mutation in mutations:
        steps = vm.count_steps(pattern, mutation)

        # 4. Check for vulnerability
        if steps > threshold:
            # Potential vulnerability found!
            pass

Step Counting

# Linear pattern: steps β‰ˆ n
# Polynomial O(nΒ²): steps β‰ˆ nΒ²
# Exponential O(2^n): steps β‰ˆ 2^n

# Detection thresholds:
LINEAR_THRESHOLD = 100      # Steps per character
POLYNOMIAL_THRESHOLD = 10000
EXPONENTIAL_THRESHOLD = 100000

4. Match Mode (False Positive Control)

The match_mode setting controls how patterns are analyzed:

from redoctor.config import MatchMode

# AUTO (default): Infer from pattern anchors
# FULL: Assume full-string matching (conservative)
# PARTIAL: Assume partial matching (fewer false positives)
Mode Use Case Behavior
AUTO General use Checks for $ anchor and continuation
FULL Security audits All multi-transitions are exploitable
PARTIAL Search-style Only flagged if anchor/continuation exists

See Configuration Guide for details.

5. Complexity Analyzer

Classifies vulnerability severity:

from redoctor.diagnostics.complexity import Complexity, ComplexityType

# Classification based on step growth:
# - Linear: Safe
# - Polynomial (nΒ², nΒ³): Moderate/High risk
# - Exponential (2^n): Critical risk

# Measurement approach:
# Run with inputs of length 5, 10, 15, 20
# Compare step counts
# Fit to complexity curve

6. Recall Validator

Confirms vulnerabilities with real execution:

# Generate attack string
attack = prefix + pump * n + suffix

# Measure actual execution time
# with increasing n values
for n in [10, 20, 30]:
    attack = build_attack(n)
    time = measure_match_time(pattern, attack)

    # If time grows as expected β†’ CONFIRMED
    # If time is constant β†’ FALSE POSITIVE

Checker Selection

ReDoctor automatically selects the best checker:

def select_checker(pattern: Pattern) -> Checker:
    # 1. Check for backreferences
    if has_backreferences(pattern):
        return FuzzChecker()  # Automaton can't handle backrefs

    # 2. Check NFA size
    nfa = build_nfa(pattern)
    if nfa.size > MAX_NFA_SIZE:
        return FuzzChecker()  # NFA too large

    # 3. Default to automaton
    return AutomatonChecker()

Performance Characteristics

Aspect Automaton Checker Fuzz Checker
Speed Very fast (ms) Slower (100ms-s)
Accuracy High Good
Backreferences ❌ Not supported βœ… Supported
Lookaround Partial βœ… Full
Deterministic βœ… Yes ❌ No (random seed)

Diagnostics Output

The final output includes:

Diagnostics(
    status=Status.VULNERABLE,
    source="^(a+)+$",
    complexity=Complexity(EXPONENTIAL),  # O(2^n)
    attack_pattern=AttackPattern(
        prefix="a",
        pump="aaaaaaaaaaaa",
        suffix="!",
    ),
    hotspot=Hotspot(
        start=0,
        end=8,
        pattern="^(a+)+$",
    ),
    checker="automaton",
    message="ReDoS vulnerability detected with O(2^n) complexity.",
)

Extending ReDoctor

Custom Checkers

from redoctor.diagnostics.diagnostics import Diagnostics

class CustomChecker:
    def check(self, pattern: str) -> Diagnostics:
        # Your custom detection logic
        pass

Custom Seeders

from redoctor.fuzz.seeder import Seeder

class CustomSeeder(Seeder):
    def generate(self, pattern) -> List[FString]:
        # Your custom seed generation
        pass

Next Steps