Regex part deux - INTERPOLATION_SYNTAX by kbrock · Pull Request #669 · ruby-i18n/i18n

kbrock · 2023-06-13T03:35:05Z

Thanks for the great gem.

I was curious what I could do with INTERPOLATION_SYNTAX.

This has 3 commits.

The commit from the regex Improve TOKENIZER by 23% #668
An update to INTERPOLATION_SYNTAX with minor ruby changes.
Dropping INTERPOLATION_SYNTAX and just using TOKENIZER.

I ran tests with ruby 2.6.9 and 3.0.6

Not sure when the syntax change for the substring was introduced str[1..].
rubocop suggested I change my str[1..-1] over to that.
They also said the backslash in [^\}] was not necessary.

Let me know if you would like to keep INTERPOLATION_SYNTAX and I can throw away the second commit. Or if you like it, I can squash the two.
Something was nice about the multiple capture groups in the regular expression, but I didn't feel the complexity (from pcre's perspective) bought too much. But since this is your project, it is your call.

Also in reference to #667

As I started to run numbers, I'm feeling less and less like this is a DoS. So maybe I'm not the right person to state an opinion on whether these changes are necessary.

From the commit messages

/(%)?(%\{([^\}]+)\})/ =~ '%{{'*9999)+'}'

/(%)?(%\{([^\}]+)\})/ ==> 199,984 steps
/(%%?)\{([^\}]+)\}/   ==> 129,989 steps

/(%%?\{[^\}]+\})/     ==>  99,992 steps

But that hasn't reached the TOKENIZER performance, so the second commit went with that one:

/(%%?\{[^\}]+\})/     ==>  99,992 steps

From what I can see, this is done in linear time: 4*O(n) This tokenizer change converts that to something a little quicker: 3*O(n) Seems that not using a capture group and something other than split would be a big win. Other than that, the changes were meager. I used https://regex101.com/ (and pcre2) to evaluate the cost of the TOKENIZER. I verified with cruby 3.0.6 ``` /(%%\{[^\}]+\}|%\{[^\}]+\})/ =~ '%{{'*9999)+'}' /(%%\{[^\}]+\}|%\{[^\}]+\})/ ==> 129,990 steps /(%?%\{[^\}]+\})/ ==> 129,990 steps /(%%?\{[^\}]+\})/ ==> 99,992 steps (simple savings of 25%) <=== /(%%?\{[^%}{]+\})/ ==> 89,993 steps (limiting variable contents has minimal gains) ``` Also of note are the null/simple cases: ``` /x/ =~ '%{{'*9999)+'}' /x/ ==> 29,998 steps /(x)/ ==> 59,996 steps /%{x/ ==> 49,998 steps /(%%?{x)/ ==> 89,993 steps ``` And comparing against a the plain string of the same length. ``` /x/ =~ 'abb'*9999+'c' /x/ ==> 29,999 /(%%?{x)/ ==> 59,998 /(%%?\{[^\}]+\})/ ==> 59,998 /(%%\{[^\}]+\}|%\{[^\}]+\})/ ==> 89,997 ``` per ruby-i18n#667

same as tokenizer change: from regex101.com pcre2 debugger: ```ruby /(%)?(%\{([^\}]+)\})/ =~ '%{{'*9999)+'}' /(%)?(%\{([^\}]+)\})/ ==> 199,984 steps /(%%?)\{([^\}]+)\}/ ==> 129,989 steps /(%%?\{[^\}]+\})/ ==> 99,992 steps ``` So the extra capture group is the main hit.

kbrock · 2023-06-13T03:37:47Z

come to think of it, may be able to skip using this regular expression at all. or if using it, skip on the capture group all together. But feeling this is way overkill, especially since we are in linear time.

radar · 2023-06-21T10:29:59Z

Not sure when the syntax change for the substring was introduced str[1..].

Ruby 2.6.

This is currently the earliest version of Ruby that i18n supports, so I think it is safe.

radar · 2023-06-21T10:33:05Z

I like it! Simpler regular expressions will always get my vote.

kbrock added 3 commits June 12, 2023 23:21

condense to TOKENIZER

8940781

radar merged commit 7cf0947 into ruby-i18n:master Jun 21, 2023

radar mentioned this pull request Jun 21, 2023

[BUG] Possible Denial of Service #667

Closed

kbrock deleted the regex2 branch July 13, 2023 01:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regex part deux - INTERPOLATION_SYNTAX#669

Regex part deux - INTERPOLATION_SYNTAX#669
radar merged 3 commits intoruby-i18n:masterfrom
kbrock:regex2

kbrock commented Jun 13, 2023

Uh oh!

kbrock commented Jun 13, 2023

Uh oh!

radar commented Jun 21, 2023

Uh oh!

radar commented Jun 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

kbrock commented Jun 13, 2023

From the commit messages

Uh oh!

kbrock commented Jun 13, 2023

Uh oh!

radar commented Jun 21, 2023

Uh oh!

radar commented Jun 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants