Why your regex can hang the server.
A twelve-character regex can hang for hours on a thirty-character input. The bug has its own CVE class (ReDoS), it's shipped in production at Stack Overflow and Cloudflare, and it's almost always one of three patterns. Here's the explanation, the patterns to recognize, and the four ways to actually defend against it.
You ship a regex. It looks reasonable. The CI passes; the unit tests pass; the staging environment is fine. Then you push to production, a user pastes in 30 characters of input, and your service hangs. Not slow — hangs. The CPU is at 100%. The request times out. The next request hits the same hot code path. Now your service is down, and the regex doesn't release until you forcibly kill the process.
This is catastrophic backtracking, also known as regular-expression denial of service or ReDoS. It has its own CVE class. It's shipped in production code at companies you've heard of. And it's almost always one of three patterns once you know what to look for.
A regex hung Stack Overflow for 34 minutes.
On 20 July 2016, Stack Overflow went down for thirty-four minutes. The post-mortem identified the cause: a regex used to trim trailing whitespace from text in user posts. The pattern, simplified, was:
^[\s]+|[\s]+$
Read it as: trim leading or trailing whitespace, where "whitespace" includes the zero-width non-joiner. The regex is correct. The complexity comes from how the engine evaluates it on a string that's mostly whitespace at the end and doesn't ever match the anchor at the start. On a 20,000-character string, evaluating took milliseconds. On a longer one, evaluating took longer than the request timeout. Compounded across requests, the site fell over.
The pattern looks innocent. The bug is invisible without thinking about how the engine actually walks the string. That's the trap — and it's why every developer who works with regex eventually ships one of these without noticing.
What backtracking actually is.
Most regex engines (PCRE, JavaScript, Python, Java, .NET, Ruby) are based on a non-deterministic finite automaton — an NFA. When the engine encounters a choice (an alternation, a quantifier, an optional group), it picks one path and remembers the others. If the path fails, the engine backtracks: it returns to the last save point and tries the next path.
For most patterns this is fine. The number of save points is linear in the pattern length, and the engine explores them in linear or polynomial time. You get a match or no-match in microseconds and never think about it.
The trouble starts when a single character of input creates more than one save point — and the same character can be matched multiple ways by the same pattern. Now the number of paths through the regex is no longer linear in the input length. Sometimes it's exponential.
How a regex goes from linear to exponential.
Consider this pattern, which any senior engineer has written without thinking twice:
^(a+)+$
Read as: one or more a's, repeated one or more times, with the whole string anchored. It matches a, aa, aaaaaa, all the same as a+ would. So you'd never write it deliberately — but compiled patterns sometimes generate it, and refactors sometimes leave it.
Now run it against the input aaaaaaaaaaaaaaaaaaaaX — twenty a's followed by an X. The engine has to decide, for each a, whether it belongs to the inner a+ or the outer + repetition. There are 2n such partitions for n a's. The engine explores every one before declaring no-match.
| Input length | Partitions to try | Time on a fast laptop |
|---|---|---|
10 a's + X | 1,024 | under a millisecond |
20 a's + X | 1,048,576 | ~50 ms |
25 a's + X | 33,554,432 | ~1.5 seconds |
30 a's + X | 1,073,741,824 | ~50 seconds |
35 a's + X | 34,359,738,368 | ~30 minutes |
40 a's + X | 1,099,511,627,776 | ~17 hours |
The input is forty characters long. Forty. Anyone can paste forty characters into a comment box. And the regex was a routine pattern someone wrote without thinking. That's the shape of the problem.
The three patterns to recognize.
Almost every catastrophic-backtracking case in the wild reduces to one of these three:
1. Nested quantifiers. Anything of the form (a+)+, (a*)*, (a+)*, (a*)+. The outer quantifier multiplies the work of the inner one. Refactor to a single quantifier — a+ in this case does the same job in linear time.
// bad
/^(\d+)+$/
// good
/^\d+$/
2. Alternation with overlap. Patterns like (a|aa)+ or (\w|\s)+. The two alternatives can match the same input, so the engine tries both at every position. The fix is to make the alternation non-overlapping, or to use a character class instead. [a]+ does what (a|aa)+ intended; [\w\s]+ does what (\w|\s)+ intended.
3. Optional then required. Patterns like a*b against an input that's a long run of a's without a final b. Less catastrophic on its own — usually quadratic, not exponential — but the same underlying mechanism: the engine retries the optional match at every position before concluding no-match. Becomes exponential when nested inside a quantified group, which is a special case of pattern #1.
The Stack Overflow incident was a variant of pattern #2 with a twist — the [\s]+ character class itself isn't ambiguous, but the alternation between leading-anchor and trailing-anchor created similar amplification on input that didn't match either anchor cleanly. The general lesson holds: any time the engine has more than one way to match the same input character, you're in danger.
How to spot it before deploying.
Three checks I run on any regex that touches user input before it ships:
The length-doubling test. Time the regex against three inputs that all fail to match, of lengths 10, 20, and 40 characters. Linear regex: roughly 10× the time at 40 chars vs 10. Catastrophic regex: 1,000,000× the time, or it never returns. The signal is unmistakable.
The visualizer. Tools like regex101 show the number of steps the engine takes against an input. Linear pattern on 100 characters: a few hundred steps. Catastrophic pattern: tens of millions or "max steps exceeded." Run the regex against a representative worst-case string before merging.
The static analyzer. The safe-regex npm package, the regexploit Python tool, and Google's vuln-regex-detector all flag patterns matching the three shapes above. They produce false positives but they catch the worst offenders without you having to be the one who notices.
The Regex Rambler on this site doesn't currently have a backtracking detector built in. It's on the roadmap — the same length-doubling test above can be added as a "safety check" badge for any pattern entered. Not shipped yet; the workaround is to paste the pattern into one of the tools above.
Four ways to actually defend.
In rough order of effort and effectiveness:
1. Rewrite the pattern. Most catastrophic regexes have a non-catastrophic equivalent. ^(a+)+$ becomes ^a+$. (\w|\s)+ becomes [\w\s]+. (a|aa)+ becomes a+. If the pattern came from a library or a copy-pasted snippet, audit it; it's almost certainly fixable in place.
2. Use atomic groups or possessive quantifiers. An atomic group (?>...) tells the engine: "once you've matched what's inside, don't backtrack into it." A possessive quantifier ++ says the same thing. They make the pattern strictly less expressive (some matches that would succeed under regular backtracking will now fail), but in 95% of real-world cases that's exactly what you want — the alternative match would have been pathological anyway.
// PCRE / Java / Ruby / .NET
^(?>a+)+$ // atomic group
^a++$ // possessive quantifier (PCRE)
// JavaScript and Python don't support atomic groups in stable releases,
// but you can simulate one with a lookahead: (?=(a+))\1
3. Use a linear-time engine. RE2 (Google), the Rust regex crate, and Go's regexp package implement regex with a guaranteed linear-time matching algorithm. They reject some constructs that backtracking engines support (most notably backreferences), but everything they accept they execute in O(n × m) where n is input length and m is pattern length. No catastrophic input exists, by construction.
4. Add a timeout. If you can't avoid running user-supplied or untrusted patterns and you can't switch engines, wrap the match in a timeout. .NET has Regex.MatchTimeout built in. Other languages need a thread or process boundary. The timeout doesn't fix the regex; it just bounds the damage.
The right choice depends on context. If you're writing application code and you control the patterns, option 1 is correct: just don't write the bad pattern. If you're writing a service that runs user-supplied regex (a search engine, a notification rule processor, a log filter), options 3 and 4 are mandatory; option 2 isn't enough because users can construct patterns that defeat it.
Takeaways.
The thing to remember: any time a regex has nested quantifiers, overlapping alternations, or optional patterns followed by required ones, suspect catastrophic backtracking and verify with the length-doubling test. If the pattern touches user input, switch to a linear-time engine or add a timeout — don't trust your ability to write a "safe" backtracking regex on every line of code, forever.
Catastrophic backtracking is one of those bugs that happens to careful engineers, on small patterns, after passing tests. The defense isn't being cleverer; it's recognizing the three shapes above and treating any regex that matches them as suspect by default. Cheaper than a 34-minute outage.
Test a pattern in your browser.
Paste a regex and a test string into the Regex Rambler. See live matches, capture groups, and (in a future release) a backtracking-safety check. Runs entirely in your browser — your patterns and test strings don't go to a server, which matters when you're testing patterns against confidential log lines.
Open the Regex Rambler