You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I understand that Go's regexp packages uses finite automaton. Working on uap-rust, I learned that uap-core's extensive use of large bounded repetition (.{x,y} with a large y) causes a state explosion in FA engines (which makes sense), you can see some of it at rust-lang/regex#1206 (reply in thread) as I asked the rust-regex author for assistance (I was investigating the large memory use of my rust-regex-based implementation compared to a C++ re2 implementation).
Assuming Go's regexp is inspired by or derived from re2, it should be much less sensible than rust-regexp, and should not have the performance traps / cliffs of jitted DFA, but I would still expect that the bounded repetitions significantly increase memory use for no value.
Experimentally, in uap-rust I found that converting repetitions with an upper bound of 100 or more (so 3+ digits) had no functional implication, with all of those having been added to mitigate catastrophic backtracking. Below 100 things are more risky, I found at least one regex which uses a functional repetition with an upper bound of 50.
The text was updated successfully, but these errors were encountered:
I understand that Go's regexp packages uses finite automaton. Working on uap-rust, I learned that uap-core's extensive use of large bounded repetition (
.{x,y}
with a large y) causes a state explosion in FA engines (which makes sense), you can see some of it at rust-lang/regex#1206 (reply in thread) as I asked the rust-regex author for assistance (I was investigating the large memory use of my rust-regex-based implementation compared to a C++ re2 implementation).Assuming Go's regexp is inspired by or derived from re2, it should be much less sensible than rust-regexp, and should not have the performance traps / cliffs of jitted DFA, but I would still expect that the bounded repetitions significantly increase memory use for no value.
Experimentally, in uap-rust I found that converting repetitions with an upper bound of 100 or more (so 3+ digits) had no functional implication, with all of those having been added to mitigate catastrophic backtracking. Below 100 things are more risky, I found at least one regex which uses a functional repetition with an upper bound of 50.
The text was updated successfully, but these errors were encountered: