-
Notifications
You must be signed in to change notification settings - Fork 131
SI-7710 fix memory performance of RegexParsers in jdk7u6+ #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hum, travis still tries to build against 2.11.0-SNAPSHOT... |
+1 |
Great! Do you know if there is a code path that can have a lot of parsing/recursion without calling regex? If so, it would still exhibit the old O(n^2) behavior. I put off writing a fix until I had time to track it down. |
@Ichoran I couldn't find any other heavy user of substring/subSequence in there. There are still a few but they are:
|
@adriaanm: could you take a look at this when you have time? That bug is pretty annoying :) |
Sorry, been busy with Scaladays etc :-) |
Travis is failing because your PR is based on an older version of master that tested against 2.11.0-SNAPSHOT (which is now 2.11.2-SNAPSHOT). |
I rebased it. Mima also wasn't happy with a private class in a trait, so I moved it out. |
throw new IndexOutOfBoundsException(s"start: ${_start}, end: ${_end}, length: $length") | ||
|
||
new SubSequence(s, start + _start, _end - _start) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry! one last nitpick: indentation is off-by-one here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. Fixed :-)
Thanks, LGTM! |
Starting with 1.7.0_06 [1], String.substring no longer reuses the internal char array of the String but make a copy instead. Since we call subSequence twice for *every* input character, this results in horrible parse performance and GC. With the benchmark from the (duplicate) ticket SI-8542, I get: BEFORE: parseAll(new StringReader(String)) For 100 items: 49 ms For 500 items: 97 ms For 1000 items: 155 ms For 5000 items: 113 ms For 10000 items: 188 ms For 50000 items: 1437 ms === parseAll(String) For 100 items: 4 ms For 500 items: 67 ms For 1000 items: 372 ms For 5000 items: 5693 ms For 10000 items: 23126 ms For 50000 items: 657665 ms AFTER: parseAll(new StringReader(String)) For 100 items: 43 ms For 500 items: 118 ms For 1000 items: 217 ms For 5000 items: 192 ms For 10000 items: 196 ms For 50000 items: 1424 ms === parseAll(String) For 100 items: 2 ms For 500 items: 8 ms For 1000 items: 16 ms For 5000 items: 79 ms For 10000 items: 161 ms For 50000 items: 636 ms [1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6924259
Great! I've restarted the build on Travis. Looks like failure was on their end. |
SI-7710 fix memory performance of RegexParsers in jdk7u6+
@Ichoran @gourlaysama: maybe Note that, in the example discussed there, the lines parsed via regex were of equal length. I suspect that if the regex parser were replaced by a non-regex parser, the slow-down described there would still happen. |
@ceilican in that example, |
Great! Thanks! When will Scala 2.11.2 with this fix be released? (This fix would allow my project to handle much bigger inputs. Right now I am wasting a lot of a cluster's computer resources because of this, and therefore I would really be interested in the Scala 2.11.2 release as soon as possible.) In case it will not be released soon, is there an easy way (e.g. by simply changing something in my project's build.sbt file) to benefit from this fix before it is released? |
scala-parser-combinators is now versionned separately, so Until then, you can always build it from source, publish it locally and @adriaanm I'd be happy to help with maintenance :-) |
@gourlaysama, great! Thank you! I've added you to our Community Maintainers team. Use the power wisely ;-) We're always happy to help, so don't hesitate to ask when in doubt. I'd suggest adding your email address to .travis.yml so that you're notified of build breakage. Also, @gkossakowski is our infrastructure tsar this year. One task on his TODO list is to make releases tag-driven, so maintainers can cut releases easily. Feel free to prod him on this ;-) |
Until we have automated releases, I'm happy to have you, as a maintainer, tag a release when it's ready, and ping @gkossakowski to publish it. This should be done before the PR freeze of the next Scala release (July 14 for 2.11.2) in order to have it included in the distro. While you're at it, please also bump the Scala & sbt versions to the latest stable release. Tags really shouldn't be undone/changed, so please tread lightly. |
Starting with 1.7.0_06 [1], String.substring no longer reuses the internal
char array of the String but make a copy instead. Since we call
subSequence twice for every input character, this results in horrible
parse performance and GC.
With the benchmark from the (duplicate) ticket SI-8542, I get:
BEFORE:
AFTER:
[1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6924259