See the Difference with RegexBuddy

If you try this example with RegexBuddy's debugger, you will see that the original regex ^(.*?,){11}P needs 29,685 steps to conclude there regex cannot match 1,2,3,4,5,6,7,8,9,10,11,12. If the string is1,2,3,4,5,6,7,8,9,10,11,12,13, just 3 characters more, the number of steps doubles to 60,313. It's not too hard to imagine that at this kind of exponential rate, attempting this regex on a large file with long lines could easily take forever.
Evidence of catastrophic backtracking in RegexBuddy
Our improved regex ^([^,\r\n]*,){11}P, however, needs just 72 steps to fail, whether the subject string has 12 numbers, 13 numbers, 16 numbers or a billion. While the complexity of the original regex was exponential, the complexity of the improved regex is constant with respect to whatever follows the 11th field. The reason is the regex fails almost immediately when it discovers the 12th field doesn't start with a P. It backtracks the 11 iterations of the group without expanding again.
The complexity of the improved regex is linear to the length of the first 11 fields. 34 steps are needed in our example to match the 11 fields. It takes only 1 step to discover that P can't be matched. Another 35 steps are then needed to to backtrack all the iterations of the two quantifiers. That's the best we can do, since the engine does have to scan through all the characters of the first 11 fields to find out where the 12th one begins. Our improved regex is a perfect solution.

Post a Comment