Notes on performance #144

jamesdbrock · 2022-01-23T15:05:39Z

This is what the benchmarks currently look like:

Text.Parsing.StringParser.CodeUnits

StringParser.runParser parse23Units
mean   = 10.10 ms
stddev = 1.13 ms
min    = 9.46 ms
max    = 24.07 ms

Text.Parsing.Parser.String

runParser parse23
mean   = 44.20 ms
stddev = 6.38 ms
min    = 42.25 ms
max    = 113.16 ms

Data.String.Regex

Regex.match pattern23
mean   = 728.23 μs
stddev = 339.32 μs
min    = 613.72 μs
max    = 2.97 ms

I would like to reduce that 4× slowness between Parser.String and StringParser.CodeUnits .

The difference could be due to:

CodePoint rather than Char. Everything goes through the anyCodePoint parser since Unicode correctness #119 , but I benchmarked it at the time and it didn't make any difference.
String tail state. Every time the parser advances by one character, we uncons the input string and save the tail as the new state. I tried changing that to only keeping a codeunit index into the input string on this branch and it didn't make any difference. https://github.com/jamesdbrock/purescript-parsing/tree/cursor
Parsing.Parser.String input position tracking with Pos { line :: Int, column :: Int}. I tried changing that to Pos Int on this branch and it didn't make any difference. https://github.com/jamesdbrock/purescript-parsing/tree/cursor
Monad transformers. When I look at the benchmark profiling, it looks like most of the time is spent in Control.Monad.State.Trans.bind and Text.Parsing.Parser.Combinators.tryRethrow. So this might be the entire problem, but improving this won't be easy.

The text was updated successfully, but these errors were encountered:

jamesdbrock · 2022-02-01T09:23:15Z

Thread about Tail Call Elimination
https://discord.com/channels/864614189094928394/936908261200912394/936947552874545202

jamesdbrock · 2022-02-03T09:31:13Z

Would refactoring the ParserT to use continuation-passing-style internally instead of the ExceptT StateT transformer stack improve the speed?

jamesdbrock · 2022-02-03T23:24:36Z

https://discord.com/channels/864614189094928394/938814730439655535

Hi @natefaubion , why is https://github.com/natefaubion/purescript-language-cst-parser/blob/main/src/PureScript/CST/Parser/Monad.purs written this way? Would it make sense to use your techniques for purescript-parsing?

Because I needed a faster, stack-safe parser. If you are writing a transformer, the techniques do not make sense (or rather, are unnecessary). For transformers, stack safety is determined by the base Monad in your transformer stack. CPS is OK for a transformer, or at least, you aren't losing anything by using CPS in a transformer, since you are paying an abstraction tax regardless.

In general, though CPS does not necessarily mean "faster", because JS runtimes don't exploit CPS. CPS is great in functional runtimes that have tail call elimination, where dynamic tail calls turn into jumps. CPS encodings of direct-style code usually means trading allocation of data for allocation of closures, which can be faster as it's often easier for an optimizing compiler to remove the closure allocations.

natefaubion · 2022-03-14T18:43:08Z

Note that I recently revamped language-cst internals to be a bit faster and simpler by switching to CPS with uncurried functions and a trampoline eliminator. For ParserT, it might look like:

newtype ParserT s m a = ParserT
  ( forall r
      . Fn5
          (ParserState s) -- State argument
          (m (Unit -> r) -> r) -- Trampoline/lift
          (Fn2 (ParserState s) ParseError r) -- Fail
          (Fn2 (ParserState s) a r) -- Succeed
          r
  )

I think the would also give you the flexibility to run your parser in any MonadRec without having to write your parser in terms of MonadRec. I'd expect that this would help close the performance gap significantly (and provide stack safety by default).

jamesdbrock · 2022-03-14T22:33:47Z

Thanks @natefaubion that is extremely helpful.

natefaubion · 2022-03-20T00:29:09Z

Link #154

natefaubion · 2022-03-20T00:32:17Z

Some other notes on string internals in particular:

string is potentially slow on failure because of the show call to the input string. Failure cases are hit very often, and if you are using string a lot this can be significant overhead. Deferred errors might be better in general.
satisfy combinators are implemented in terms of any combinators with an esoteric tryRethrow combinator. If you implement any in terms of satisfy (const true) then you can remove a lot of overhead for the satisfy calls.

natefaubion · 2022-03-20T00:38:17Z

choice should use foldr (really <|> should be right associative).

purescript/purescript-control#79

jamesdbrock · 2022-03-22T09:56:14Z

choice should use foldr (really <|> should be right associative).

This was done by @natefaubion in #154

This was referenced Mar 22, 2022

Defer error in string #158

Open

implement any in terms of satisfy (const true) #159

Closed

jamesdbrock closed this as completed Apr 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notes on performance #144

Notes on performance #144

jamesdbrock commented Jan 23, 2022 •

edited

Loading

jamesdbrock commented Feb 1, 2022 •

edited

Loading

jamesdbrock commented Feb 3, 2022

jamesdbrock commented Feb 3, 2022

natefaubion commented Mar 14, 2022

jamesdbrock commented Mar 14, 2022

natefaubion commented Mar 20, 2022

natefaubion commented Mar 20, 2022 •

edited

Loading

natefaubion commented Mar 20, 2022 •

edited

Loading

jamesdbrock commented Mar 22, 2022

Notes on performance #144

Notes on performance #144

Comments

jamesdbrock commented Jan 23, 2022 • edited Loading

Text.Parsing.StringParser.CodeUnits

Text.Parsing.Parser.String

Data.String.Regex

jamesdbrock commented Feb 1, 2022 • edited Loading

jamesdbrock commented Feb 3, 2022

jamesdbrock commented Feb 3, 2022

natefaubion commented Mar 14, 2022

jamesdbrock commented Mar 14, 2022

natefaubion commented Mar 20, 2022

natefaubion commented Mar 20, 2022 • edited Loading

natefaubion commented Mar 20, 2022 • edited Loading

jamesdbrock commented Mar 22, 2022

jamesdbrock commented Jan 23, 2022 •

edited

Loading

jamesdbrock commented Feb 1, 2022 •

edited

Loading

natefaubion commented Mar 20, 2022 •

edited

Loading

natefaubion commented Mar 20, 2022 •

edited

Loading