-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Improve VM's JSON decoder #55522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This pragma forces compiler to align loop headers within the function by architecture specific boundary: 32 bytes on X64 and ARM64 (with the exception of Apple Silicon, which explicitly discourages aligning branch targets in the optimization manual). Current implementation is rather naive and does not do any attempt to decide whether aligning is actually profitable based on loop body itself. I have found this pragma to be helpful both to stabilize benchmark results and achieve better performance for tight loops on Intel hardware. Issue #55522 TEST=vm/dart/align_loops_test Cq-Include-Trybots: luci.dart.try:vm-aot-linux-product-x64-try,vm-aot-linux-debug-x64c-try,vm-aot-linux-debug-x64-try,vm-aot-linux-release-arm64-try,vm-aot-linux-release-simarm_x64-try,vm-aot-linux-release-x64-try,vm-aot-mac-product-arm64-try,vm-aot-mac-release-arm64-try,vm-aot-mac-release-x64-try,vm-aot-obfuscate-linux-release-x64-try,vm-aot-optimization-level-linux-release-x64-try,vm-aot-win-release-x64-try,vm-aot-win-debug-arm64-try,vm-aot-win-debug-x64-try,vm-aot-win-debug-x64c-try,vm-aot-win-product-x64-try,vm-aot-win-release-arm64-try,vm-aot-linux-debug-simarm_x64-try,vm-aot-linux-debug-simriscv64-try,vm-aot-dwarf-linux-product-x64-try,vm-aot-android-release-arm64c-try,vm-aot-android-release-arm_x64-try Change-Id: Ic22fb90d85e7fdebeeaa3908a43328c59436ab58 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/364121 Reviewed-by: Alexander Markov <[email protected]> Commit-Queue: Slava Egorov <[email protected]>
It was not removing all CheckStackOverflow instructions. Issue dart-lang#55522 TEST=vm/cc/CheckStackOverflowElimination_NoInterruptsPragma Change-Id: I1a3db6539951ab4b6450804393c89bc6aafff5b3 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/364324 Reviewed-by: Alexander Markov <[email protected]> Commit-Queue: Slava Egorov <[email protected]>
This pragma instructs compiler to remove all bounds checks from the annotated function. This can be helpful when tuning performance of hot tight loops where compiler is unable to eliminate bounds check itself. For very tight loops I have measured 25-50% overhead from bounds checks which I think comes from some combination of general code quality issues due to fixed input registers and increased branch density. In future it could be possible to teach our range analysis to eliminate bounds checks when loop bound is itself bounded by array length, but for now we can resort to this pragma for extremely hot library code. Issue #55522 TEST=vm/cc/BoundsCheckElimination_Pragma [email protected] Change-Id: Ia7b1e88a16a2b45fa8593a227a4985568892b29c Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/364500 Reviewed-by: Alexander Markov <[email protected]> Commit-Queue: Slava Egorov <[email protected]>
@mraleph Do you see any potential performance improvements from merging the underlying regular UTF8 automaton with an automaton that lexes JSON and using that as the lexer for the JsonDecoder (to have a Utf8JsonDecoder) so that no fusing is needed? |
The fusing creates a UTF-8 specialized JSON decoder, |
Thank you for the hint @lrhn. It looks to me like the specialized UTF8-JSON decoder is using a table-driven UTF8 decoder and a non-table-driven JSON parser. What I meant to say in this comment was, since the language of UTF8 appears to be regular, and the lexical structure of JSON seems to be regular too, it feels like there might be performance gains to be had by merging both automata into one and having the tokenization phase be based on a single automaton that does UTF8 decoding and JSON lexing at the same time. |
I think the UTF-8 is a red herring here. The reason the current implementation doesn't use a table based tokenizer is that it doesn't need to tokenize at all. The moment it sees the first character of a value, it can start interpreting it. It tries to collect the value of a number and the contents of a string while scanning, in one pass, so it doesn't have to first tokenize, and then interpret the token afterwards. Maybe that process can be made more efficient, but doing an extra tokenization step first doesn't sound like an improvement. |
Use aligned text offset for instructions when resolving static calls during relocation. The vm/dart/align_loops_test is adjusted to cover this issue: previous we only verified alignment but did not actually run the generated code, so the bug was undetected. Issue #55522 TEST=vm/dart/align_loops_test [email protected] Change-Id: I56521c104f91b85150584539f58191cf4592f4f5 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/365144 Reviewed-by: Alexander Markov <[email protected]> Commit-Queue: Slava Egorov <[email protected]>
Previously we would only fuse this pattern for Smi operations which (almost) never happens in AOT. This CL enables fusion for Int64 operations as well. The implementation is limited to X64 and ARM64 for now because implementing it on 32-bit platforms is somewhat cumbersome. Issue #55522 [email protected] TEST=vm/cc/IL_TestIntInstr,vm/dart/test_int_pattern_il_test Cq-Include-Trybots: luci.dart.try:vm-linux-release-simarm-try,vm-aot-linux-release-simarm_x64-try,vm-aot-linux-debug-simarm_x64-try,dart-sdk-linux-riscv64-try,vm-aot-linux-debug-simriscv64-try,vm-ffi-qemu-linux-release-riscv64-try,vm-linux-debug-simriscv64-try,vm-linux-release-ia32-try Change-Id: I62a482640db45befac6b0b78850f23a8cc624c75 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/365463 Reviewed-by: Alexander Markov <[email protected]> Commit-Queue: Slava Egorov <[email protected]>
This CL focuses on improving parsing of white space (between JSON tokens) and simple strings which don't contain escape sequences inside. Improvements are achieved by changing the code to table driven implementation instead of if-cascade: we have a 256 element table which stores attributes for characters (e.g. whether it is a white space or a terminal token for a simple string) and use this table to make decisions on whether to advance through characters or stop a loop and do something else. We also suppress bounds checks and interrupt checks in tight loops - in tight loops like this a bound checks can cost 30% in overhead. This CL brings 28% geomean improvement on benchmarks from the linked issue. (All measurements are done in X64 Product AOT) Individual measurements are: | Input JSON | ms/iter | vs HEAD | vs V8 | | ---------- | ------- | ------- | ----- | | apache_builds.json | 0.44 | 61.06% | 136.86% | | canada.json | 31.04 | 96.54% | 187.15% | | citm_catalog.json | 6.43 | 64.44% | 93.94% | | github_events.json | 0.23 | 59.02% | 128.86% | | google_maps_api_compact_response.json | 0.10 | 82.12% | 133.83% | | google_maps_api_response.json | 0.12 | 68.79% | 140.07% | | gsoc-2018.json | 9.25 | 44.89% | 147.43% | | instruments.json | 1.08 | 70.18% | 167.38% | | marine_ik.json | 21.07 | 88.25% | 142.58% | | mesh.json | 4.51 | 94.56% | 136.57% | | mesh.pretty.json | 9.97 | 83.76% | 193.79% | | numbers.json | 0.57 | 91.88% | 83.37% | | random.json | 3.79 | 78.32% | 107.18% | | repeat.json | 0.06 | 71.51% | 118.47% | | semanticscholar-corpus.json | 37.65 | 54.82% | 57.81% | | tree-pretty.json | 0.17 | 68.68% | 162.33% | | twitter_api_compact_response.json | 0.06 | 75.23% | 126.11% | | twitter_api_response.json | 0.08 | 70.64% | 123.60% | | twitterescaped.json | 3.88 | 84.66% | 177.94% | | twitter.json | 3.54 | 73.01% | 105.33% | | twitter_timeline.json | 0.37 | 81.52% | 271.37% | | update-center.json | 2.85 | 66.75% | 89.01% | vs HEAD (geomean): 72.94% vs V8 (geomean): 130.99% HEAD vs V8 (geomean): 179.59% Issue #55522 TEST=covered by co19 Change-Id: Id673118c19250ab7781cc98c7656b972debc60ff Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/365803 Reviewed-by: Alexander Markov <[email protected]> Commit-Queue: Slava Egorov <[email protected]>
The current decoder generates maps that take a lot of memory.
creates fresh strings for the repeated There is a cost to using a fresh string as a key. Other than memory footprint, the |
Maps and strings can probably be optimized. It's much more efficient to build a custom parser for the known structures using a more pull-based JSON interpreter, than to try to recognize possibly repeated structures at parse time in a generic parser. (That is: if you care about performance, why are you building maps and lists at all?) |
I have a prototype patch which canonicalizes keys when decoding JSON (this helps both with memory consumption and improves access speed as well, because accessing maps with literal keys can short-circuit string comparison later). You still need to compute I entertained the idea of using "predictive" decoding, where decoding an array of JSON objects makes an assumption about repeated shape and uses that to short-circuit both key decoding and |
A pull-based JSON interpreter for going straight from bytes to proper Dart objects sounds great. Previous issue threads for that: |
See also https://github.com/kevmoo/json_alt – although this code is VERY old |
Maybe we should look also at how other languages have dealt with this? I feel that if we only look at how js and dart solutions have being built so far, we might miss some clever solutions that might exist for other languages. |
Instead of gradually adding key-value pairs into the Map as JSON is being parsed collect all key values first and then allocate the map with appropriate capacity. Issue #55522 TEST=ci Change-Id: Ic3032323143c35c38469d4f5f289daabdf56ecc9 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/374564 Commit-Queue: Slava Egorov <[email protected]> Reviewed-by: Lasse Nielsen <[email protected]>
Someone uses it? |
We sometimes hear complaints that our JSON decoder is slow compared to, for example, V8's. Usually we just shrug these complaints away saying that:
Developers are then encouraged to use high performance native JSON decoders if JSON decoding is a bottleneck for them.
This does not square very well with my belief that Dart should be considered a general purpose high-level programming language. So I would like to take a closer look at JSON's decoder performance and see if it can be considerably improved.
I have taken JSON files from https://github.com/simdjson/simdjson-data/tree/master/jsonexamples and run them through JS and Dart benchmarks.
Dart benchmark is measuring performance of JSON decoder fused with UTF8 decoder:
JS benchmark is measuring performance of decoding JSON from pre-decoded JS string as well as performance of coverting UTF8 bytes to a JS string:
decodeUtf8
is a helper function which I added tod8
shell which effectively does the following:String::NewFromUtf8(isolate, buffer, NewStringType::kNormal, buffer_size)
.This was done to measure the cost of utf8 -> utf16 convertion which will otherwise be hidden.
The baseline measurement for V8 on my desktop looks like this:
Dart in AOT mode yields:
With the exceptions of
numbers.json
Dart's decoder is 1.5-3x slower.Good news is that it is not impossible to significantly improve its performance - I have done some experiments and got to the following preliminary numbers:
Which are significantly better.
I am going to use this issue as an umbrella issue to land my changes.
The text was updated successfully, but these errors were encountered: