-
Notifications
You must be signed in to change notification settings - Fork 51
CPS internals for better performance and stack safety #154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPS internals for better performance and stack safety #154
Conversation
I should also point out that the benchmark as-is really only tests the performance and overhead of a MonadRec instance. It makes sense to me that it's 3x faster, because we are removing multiple layers of Monad transformers. I would suspect that real-world parsers would get a much bigger general boost, but I don't have benchmarks for that on hand. It would be useful to have something like a JSON parser for benchmarking. |
I've added a json parser benchmark, which shows a significant improvement over string-parsers
|
Awesome, thanks, I will take a hard look at this and let’s get it merged. |
Where does the edge over StringParsers mainly come from? Is it deferred error messages or actually something else? I won't pretend to understand the code which is why I'm asking. |
I thin it comes from the Continuation-Passing-Style representation of |
I don't have a definitive reason, but my first guess would be that string-parsers may be using inefficient implementations. Both versions of skipSpaces are extremely inefficient, but string-parsers is especially bad since it is building an explicit List of chars, foldMapping over it to build a string, and then immediately discarding it. Parsing is at least not building an intermediate data structure and processing it (aside from the If both had optimized implementations, I would expect string-parsers to be slightly faster since it doesn't have to go through a trampoline. |
I've made a few more small optimizations which makes parsing equal in performance with the previous benchmarks (optimizing the MonadRec instance), and improves it to 2x string-parsers performance in the JSON parsing benchmark.
|
Here are some benchmarks for before and after this PR. The benchmarks look amazing. I am seeing about 5× speedup across the board for this PR. I also see the “2× string-parsers performance”. (One odd note which probably doesn’t matter for this PR: I separated the benchmarks which use
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ready to merge this after we resolve the codePointAt
question, see comments.
( forall r | ||
. Fn5 | ||
(ParseState s) | ||
((Unit -> r) -> r) -- Trampoline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@natefaubion May I beg you for a short prose description of what is going on here. You discussed it a bit in the PR but can you please be more explicit?
|
||
-- | Match one or more times. | ||
many1 :: forall m s a. Monad m => ParserT s m a -> ParserT s m (NonEmptyList a) | ||
many1 :: forall m s a. ParserT s m a -> ParserT s m (NonEmptyList a) | ||
many1 p = NEL.cons' <$> p <*> many p | ||
|
||
-- | Match one or more times. | ||
-- | | ||
-- | Stack-safe version of `many1` at the expense of a `MonadRec` constraint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think after this PR we should deprecate all of the Rec
combinator variations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily, or at least I don't think you should just remove the Rec implementations without testing. The parser is always stack-safe now, yes, but the iterative implementations may be more efficient at runtime. For example, the upstream many
combinator is significantly slower than manyRec
. It may be the case that we should copy over the Rec implementations and remove the calls to tailRecM, just using standard Monadic recursion. It could also be the case that the "naive" implementations perform fine in comparison and we should drop the Rec implementaiton.
@@ -211,12 +225,6 @@ match p = do | |||
-- boundary. | |||
pure $ Tuple (SCU.take (SCU.length input1 - SCU.length input2) input1) x | |||
|
|||
-- | The CodePoint newtype constructor is not exported, so here's a helper. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah fromEnum
is better.
src/Text/Parsing/Parser/String.purs
Outdated
Just { head, tail } -> updatePosString (updatePosSingle pos head) tail -- tail recursive | ||
updatePosString = go 0 | ||
where | ||
go ix pos str = case codePointAt ix str of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah this might be a problem. codePointAt
is linear in ix
. So then updatePosString
will be quadratic in length str
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you agree that this is a problem @natefaubion then I can fix this myself after we merge this PR,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can revert this. I was trying to avoid the extra allocations processing each char, but if it’s linear it could get out of hand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think this routine in general could use a bit of work, however. I believe right now it will consider \r\n
as two lines since it only ever processes one codepoint at a time.
src/Text/Parsing/Parser/String.purs
Outdated
runFn2 throw state1 (ParseError "Unexpected EOF" pos) | ||
Just { head, tail } -> do | ||
let cp = fromEnum head | ||
-- the `fromCharCode` function doesn't check if this is beyond the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: We’re not using fromCharCode
anymore so delete this comment.
Would you be willing to compare the json benchmarks with a trampolined parser? I don’t feel like 5x fully captures the improvement :D. |
I've added comments and reverted the position implementation. I've also updated it to account for |
Yes! How do I do that? I can’t figure out how to run a runTrampoline $ runParserT smallJson BenchParsing.json
|
You would need to add this to the benchmark parser: type Parser s = ParserT s Trampoline Right now it's using the exported |
Ok here are trampolined variations of the As you claimed, there appears to be zero cost for explicitly using a
|
I've updated the interface to |
CI is failing due to purescript/purescript-control#80 |
I am gonna merge this pretty soon, after which I'll fix the |
I've gone ahead and updated them for now so that it compiles. It's up for bikeshedding, however. |
Thank you @natefaubion 💙 |
You may wonder why all these changes are done in one commit, and that's because they're all kind of interrelated: - I initially updated PureScript to 0.15.x, which produces code that uses ES modules instead of CommonJS ones - The version of `purescript-decimals` in the 0.15.x package set was tested with decimal.js 10.3.1, so use that - Since we're using ES modules now, I updated `clipboardy` and `xdg-basedir` to versions that use ES modules (I admit that I didn't have to do this one in this commit) - `pulp browserify` doesn't work with PureScript 0.15.x, so I had to migrate to something else (I chose `esbuild`) Side note: `purescript-parsing` 9.0.0 brings with it a wonderful performance boost[1], which speeds up the test suite considerably. [1]: purescript-contrib/purescript-parsing#154
You may wonder why all these changes are done in one commit, and that's because they're all kind of interrelated: - I initially updated PureScript to 0.15.x, which produces code that uses ES modules instead of CommonJS ones - The version of `purescript-decimals` in the 0.15.x package set was tested with decimal.js 10.3.1, so use that - Since we're using ES modules now, I updated `clipboardy` and `xdg-basedir` to versions that use ES modules (I admit that I didn't have to do this one in this commit) - `pulp browserify` doesn't work with PureScript 0.15.x, so I had to migrate to something else (I chose `esbuild`) Side note: `purescript-parsing` 9.0.0 brings with it a wonderful performance boost[1], which speeds up the test suite by 3 times on my machine (from `102.231 s ± 0.485 s` to `34.666 s ± 0.299 s`). [1]: purescript-contrib/purescript-parsing#154
You may wonder why all these changes are done in one commit, and that's because they're all kind of interrelated: - I initially updated PureScript to 0.15.x, which produces code that uses ES modules instead of CommonJS ones - The version of `purescript-decimals` in the 0.15.x package set was tested with decimal.js 10.3.1, so use that - Since we're using ES modules now, I updated `clipboardy` and `xdg-basedir` to versions that use ES modules (I admit that I didn't have to do this one in this commit) - `pulp browserify` doesn't work with PureScript 0.15.x, so I had to migrate to something else (I chose `esbuild`) Side note: `purescript-parsing` 9.0.0 brings with it a wonderful performance boost[1], which speeds up the test suite by 3 times on my machine (from `102.231 s ± 0.485 s` to `34.666 s ± 0.299 s`). [1]: purescript-contrib/purescript-parsing#154
You may wonder why all these changes are done in one commit, and that's because they're all kind of interrelated: - I initially updated PureScript to 0.15.x, which produces code that uses ES modules instead of CommonJS ones - The version of `purescript-decimals` in the 0.15.x package set was tested with decimal.js 10.3.1, so use that - Since we're using ES modules now, I updated `clipboardy` and `xdg-basedir` to versions that use ES modules (I admit that I didn't have to do this one in this commit) - `pulp browserify` doesn't work with PureScript 0.15.x, so I had to migrate to something else (I chose `esbuild`) Side note: `purescript-parsing` 9.0.0 brings with it a wonderful performance boost[1], which speeds up the test suite by 3 times on my machine (from `102.231 s ± 0.485 s` to `34.666 s ± 0.299 s`). [1]: purescript-contrib/purescript-parsing#154
You may wonder why all these changes are done in one commit, and that's because they're all kind of interrelated: - I initially updated PureScript to 0.15.x, which produces code that uses ES modules instead of CommonJS ones - The version of `purescript-decimals` in the 0.15.x package set was tested with decimal.js 10.3.1, so use that - Since we're using ES modules now, I updated `clipboardy` and `xdg-basedir` to versions that use ES modules (I admit that I didn't have to do this one in this commit) - `pulp browserify` doesn't work with PureScript 0.15.x, so I had to migrate to something else (I chose `esbuild`) Side note: `purescript-parsing` 9.0.0 brings with it a wonderful performance boost[1], which speeds up the test suite by 3 times on my machine (from `102.231 s ± 0.485 s` to `34.666 s ± 0.299 s`). [1]: purescript-contrib/purescript-parsing#154
You may wonder why all these changes are done in one commit, and that's because they're all kind of interrelated: - I initially updated PureScript to 0.15.x, which produces code that uses ES modules instead of CommonJS ones - The version of `purescript-decimals` in the 0.15.x package set was tested with decimal.js 10.3.1, so use that - Since we're using ES modules now, I updated `clipboardy` and `xdg-basedir` to versions that use ES modules (I admit that I didn't have to do this one in this commit) - `pulp browserify` doesn't work with PureScript 0.15.x, so I had to migrate to something else (I chose `esbuild`) Side note: `purescript-parsing` 9.0.0 brings with it a wonderful performance boost[1], which speeds up the test suite by 3 times on my machine (from `102.231 s ± 0.485 s` to `34.666 s ± 0.299 s`). [1]: purescript-contrib/purescript-parsing#154
You may wonder why all these changes are done in one commit, and that's because they're all kind of interrelated: - I initially updated PureScript to 0.15.x, which produces code that uses ES modules instead of CommonJS ones - The version of `purescript-decimals` in the 0.15.x package set was tested with decimal.js 10.3.1, so use that - Since we're using ES modules now, I updated `clipboardy` and `xdg-basedir` to versions that use ES modules (I admit that I didn't have to do this one in this commit) - `pulp browserify` doesn't work with PureScript 0.15.x, so I had to migrate to something else (I chose `esbuild`) Side note: `purescript-parsing` 9.0.0 brings with it a wonderful performance boost[1], which speeds up the test suite by 3 times on my machine (from `102.231 s ± 0.485 s` to `34.666 s ± 0.299 s`). [1]: purescript-contrib/purescript-parsing#154
Description of the change
This PR changes the internals to a CPS encoding which:
runParserT
also always runs in terms ofMonadRec
).Trampoline
as their base will get upgraded "for free". TheT
cost is only paid at each call tolift
rather than in every bind, apply, etc.I've also replaced some inefficient implementations (gets and puts with multiple binds) in the Strings module. In general, I think a lot of combinators may be able to be improved in a similar way, it's just more tedious.
Note that while string-parsers is still a bit faster, it is not stack-safe in general (so it can have less overhead). This implementation makes all parsers stack safe in their execution. I suspect that if you made string-parsers stack-safe in the same way, the libraries would have identical performance characteristics. My personal opinion is that if you accept this PR, anyone caring about stack safety has no reason to use string-parsers as a specialization.
The downside is of course that this implementation is more complicated. I'll let you be the judge!
Checklist: