Improve VM's JSON decoder #55522

mraleph · 2024-04-19T13:43:37Z

We sometimes hear complaints that our JSON decoder is slow compared to, for example, V8's. Usually we just shrug these complaints away saying that:

Dart's JSON decoder is likely fast enough
Dart's JSON decoder is written in Dart and V8's is written in C++ - so some degree of performance difference is expected.

Developers are then encouraged to use high performance native JSON decoders if JSON decoding is a bottleneck for them.

This does not square very well with my belief that Dart should be considered a general purpose high-level programming language. So I would like to take a closer look at JSON's decoder performance and see if it can be considerably improved.

I have taken JSON files from https://github.com/simdjson/simdjson-data/tree/master/jsonexamples and run them through JS and Dart benchmarks.

Dart benchmark is measuring performance of JSON decoder fused with UTF8 decoder:

  final decoder = Utf8Decoder().fuse(JsonDecoder());
  // ...
    for (var i = 0; i < N; i++) {
      decoder.convert(bytes);
    }

JS benchmark is measuring performance of decoding JSON from pre-decoded JS string as well as performance of coverting UTF8 bytes to a JS string:

function benchmark(baseDir, fileName) {
  let content = read(/* ... */);
  // ...
    for (var i = 0; i < N; i++) {
      JSON.parse(content);
    }
  // ...
}

function benchmarkWithUtf8Decoding(baseDir, fileName) {
  let content = readbuffer(/* ... */);
    // ...
    for (var i = 0; i < N; i++) {
      JSON.parse(decodeUtf8(content));
    }
    // ...
}

decodeUtf8 is a helper function which I added to d8 shell which effectively does the following: String::NewFromUtf8(isolate, buffer, NewStringType::kNormal, buffer_size).

This was done to measure the cost of utf8 -> utf16 convertion which will otherwise be hidden.

The baseline measurement for V8 on my desktop looks like this:

"apache_builds.json",string,0.30905068359375
"apache_builds.json",buffer,0.3241399414062499
"canada.json",string,14.897924999999987
"canada.json",buffer,16.510981250000007
"citm_catalog.json",string,3.759248437500003
"citm_catalog.json",buffer,6.9653749999999945
"github_events.json",string,0.14244892578124996
"github_events.json",buffer,0.18018886718749982
"google_maps_api_compact_response.json",string,0.07089770507812503
"google_maps_api_compact_response.json",buffer,0.07248544921875003
"google_maps_api_response.json",string,0.08099453125
"google_maps_api_response.json",buffer,0.08426477050781252
"gsoc-2018.json",string,4.6573312500000155
"gsoc-2018.json",buffer,6.232159375000014
"instruments.json",string,0.49713339843750076
"instruments.json",buffer,0.635085742187502
"marine_ik.json",string,13.519856249999975
"marine_ik.json",buffer,14.717656249999981
"mesh.json",string,2.9209296875000064
"mesh.json",buffer,3.354901562500004
"mesh.pretty.json",string,4.357474999999999
"mesh.pretty.json",buffer,5.1385874999999945
"numbers.json",string,0.5762896484374977
"numbers.json",buffer,0.6817013671874974
"random.json",string,2.822692968749993
"random.json",buffer,3.569248437499982
"repeat.json",string,0.025887060546875063
"repeat.json",buffer,0.051292358398437446
"semanticscholar-corpus.json",string,50.387549999999464
"semanticscholar-corpus.json",buffer,55.51836249999978
"tree-pretty.json",string,0.09902978515625023
"tree-pretty.json",buffer,0.10347050781250004
"twitter_api_compact_response.json",string,0.03364210205078137
"twitter_api_compact_response.json",buffer,0.04489652099609387
"twitter_api_response.json",string,0.04411997070312523
"twitter_api_response.json",buffer,0.06057795410156288
"twitterescaped.json",string,1.7861546875000158
"twitterescaped.json",buffer,2.1810593749999954
"twitter.json",string,1.7194124999999985
"twitter.json",buffer,3.385740625000017
"twitter_timeline.json",string,0.13089736328124957
"twitter_timeline.json",buffer,0.13675043945312382
"update-center.json",string,2.1542187499999956
"update-center.json",buffer,3.237673437500007

Dart in AOT mode yields:

"apache_builds.json",0.6998046875
"canada.json",31.8375
"citm_catalog.json",9.1125
"github_events.json",0.39033203125
"google_maps_api_compact_response.json",0.115283203125
"google_maps_api_response.json",0.162744140625
"gsoc-2018.json",20.49375
"instruments.json",1.45390625
"marine_ik.json",23.26875
"mesh.json",4.640625
"mesh.pretty.json",11.571875
"numbers.json",0.570703125
"random.json",4.7703125
"repeat.json",0.0804443359375
"semanticscholar-corpus.json",69.275
"tree-pretty.json",0.22900390625
"twitter_api_compact_response.json",0.0744384765625
"twitter_api_response.json",0.1037109375
"twitterescaped.json",4.553125
"twitter.json",4.6828125
"twitter_timeline.json",0.45234375
"update-center.json",4.2859375

With the exceptions of numbers.json Dart's decoder is 1.5-3x slower.

Good news is that it is not impossible to significantly improve its performance - I have done some experiments and got to the following preliminary numbers:

"apache_builds.json",0.3947265625
"canada.json",30.7375
"citm_catalog.json",5.0671875
"github_events.json",0.22177734375
"google_maps_api_compact_response.json",0.093896484375
"google_maps_api_response.json",0.101416015625
"gsoc-2018.json",8.790625
"instruments.json",0.956640625
"marine_ik.json",18.00625
"mesh.json",3.984375
"mesh.pretty.json",8.625
"numbers.json",0.49609375
"random.json",3.4328125
"repeat.json",0.04671630859375
"semanticscholar-corpus.json",36.4875
"tree-pretty.json",0.14794921875
"twitter_api_compact_response.json",0.056640625
"twitter_api_response.json",0.071142578125
"twitterescaped.json",3.8953125
"twitter.json",3.1515625
"twitter_timeline.json",0.37626953125
"update-center.json",2.89453125

Which are significantly better.

I am going to use this issue as an umbrella issue to land my changes.

The text was updated successfully, but these errors were encountered:

This pragma forces compiler to align loop headers within the function by architecture specific boundary: 32 bytes on X64 and ARM64 (with the exception of Apple Silicon, which explicitly discourages aligning branch targets in the optimization manual). Current implementation is rather naive and does not do any attempt to decide whether aligning is actually profitable based on loop body itself. I have found this pragma to be helpful both to stabilize benchmark results and achieve better performance for tight loops on Intel hardware. Issue #55522 TEST=vm/dart/align_loops_test Cq-Include-Trybots: luci.dart.try:vm-aot-linux-product-x64-try,vm-aot-linux-debug-x64c-try,vm-aot-linux-debug-x64-try,vm-aot-linux-release-arm64-try,vm-aot-linux-release-simarm_x64-try,vm-aot-linux-release-x64-try,vm-aot-mac-product-arm64-try,vm-aot-mac-release-arm64-try,vm-aot-mac-release-x64-try,vm-aot-obfuscate-linux-release-x64-try,vm-aot-optimization-level-linux-release-x64-try,vm-aot-win-release-x64-try,vm-aot-win-debug-arm64-try,vm-aot-win-debug-x64-try,vm-aot-win-debug-x64c-try,vm-aot-win-product-x64-try,vm-aot-win-release-arm64-try,vm-aot-linux-debug-simarm_x64-try,vm-aot-linux-debug-simriscv64-try,vm-aot-dwarf-linux-product-x64-try,vm-aot-android-release-arm64c-try,vm-aot-android-release-arm_x64-try Change-Id: Ic22fb90d85e7fdebeeaa3908a43328c59436ab58 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/364121 Reviewed-by: Alexander Markov <[email protected]> Commit-Queue: Slava Egorov <[email protected]>

It was not removing all CheckStackOverflow instructions. Issue dart-lang#55522 TEST=vm/cc/CheckStackOverflowElimination_NoInterruptsPragma Change-Id: I1a3db6539951ab4b6450804393c89bc6aafff5b3 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/364324 Reviewed-by: Alexander Markov <[email protected]> Commit-Queue: Slava Egorov <[email protected]>

This pragma instructs compiler to remove all bounds checks from the annotated function. This can be helpful when tuning performance of hot tight loops where compiler is unable to eliminate bounds check itself. For very tight loops I have measured 25-50% overhead from bounds checks which I think comes from some combination of general code quality issues due to fixed input registers and increased branch density. In future it could be possible to teach our range analysis to eliminate bounds checks when loop bound is itself bounded by array length, but for now we can resort to this pragma for extremely hot library code. Issue #55522 TEST=vm/cc/BoundsCheckElimination_Pragma [email protected] Change-Id: Ia7b1e88a16a2b45fa8593a227a4985568892b29c Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/364500 Reviewed-by: Alexander Markov <[email protected]> Commit-Queue: Slava Egorov <[email protected]>

modulovalue · 2024-04-27T10:34:25Z

@mraleph Do you see any potential performance improvements from merging the underlying regular UTF8 automaton with an automaton that lexes JSON and using that as the lexer for the JsonDecoder (to have a Utf8JsonDecoder) so that no fusing is needed?

lrhn · 2024-04-27T22:02:19Z

The fusing creates a UTF-8 specialized JSON decoder, _JsonUtf8Decoder.

modulovalue · 2024-04-28T09:39:56Z

Thank you for the hint @lrhn. It looks to me like the specialized UTF8-JSON decoder is using a table-driven UTF8 decoder and a non-table-driven JSON parser.

What I meant to say in this comment was, since the language of UTF8 appears to be regular, and the lexical structure of JSON seems to be regular too, it feels like there might be performance gains to be had by merging both automata into one and having the tokenization phase be based on a single automaton that does UTF8 decoding and JSON lexing at the same time.

lrhn · 2024-04-28T12:30:25Z

I think the UTF-8 is a red herring here.
A table based tokenizer would work equally well for both UTF-8 and string based JSON parsing, since everything outside of string values is ASCII anyway, and what's inside strings need to be interpreted if it contains escapes -- or non-ASCII characters in a UTF-8 source.

The reason the current implementation doesn't use a table based tokenizer is that it doesn't need to tokenize at all. The moment it sees the first character of a value, it can start interpreting it. It tries to collect the value of a number and the contents of a string while scanning, in one pass, so it doesn't have to first tokenize, and then interpret the token afterwards.

Maybe that process can be made more efficient, but doing an extra tokenization step first doesn't sound like an improvement.
(Unless it can be done very efficiently using SIMD operations or using parallelism, or something.)

Use aligned text offset for instructions when resolving static calls during relocation. The vm/dart/align_loops_test is adjusted to cover this issue: previous we only verified alignment but did not actually run the generated code, so the bug was undetected. Issue #55522 TEST=vm/dart/align_loops_test [email protected] Change-Id: I56521c104f91b85150584539f58191cf4592f4f5 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/365144 Reviewed-by: Alexander Markov <[email protected]> Commit-Queue: Slava Egorov <[email protected]>

Previously we would only fuse this pattern for Smi operations which (almost) never happens in AOT. This CL enables fusion for Int64 operations as well. The implementation is limited to X64 and ARM64 for now because implementing it on 32-bit platforms is somewhat cumbersome. Issue #55522 [email protected] TEST=vm/cc/IL_TestIntInstr,vm/dart/test_int_pattern_il_test Cq-Include-Trybots: luci.dart.try:vm-linux-release-simarm-try,vm-aot-linux-release-simarm_x64-try,vm-aot-linux-debug-simarm_x64-try,dart-sdk-linux-riscv64-try,vm-aot-linux-debug-simriscv64-try,vm-ffi-qemu-linux-release-riscv64-try,vm-linux-debug-simriscv64-try,vm-linux-release-ia32-try Change-Id: I62a482640db45befac6b0b78850f23a8cc624c75 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/365463 Reviewed-by: Alexander Markov <[email protected]> Commit-Queue: Slava Egorov <[email protected]>

This CL focuses on improving parsing of white space (between JSON tokens) and simple strings which don't contain escape sequences inside. Improvements are achieved by changing the code to table driven implementation instead of if-cascade: we have a 256 element table which stores attributes for characters (e.g. whether it is a white space or a terminal token for a simple string) and use this table to make decisions on whether to advance through characters or stop a loop and do something else. We also suppress bounds checks and interrupt checks in tight loops - in tight loops like this a bound checks can cost 30% in overhead. This CL brings 28% geomean improvement on benchmarks from the linked issue. (All measurements are done in X64 Product AOT) Individual measurements are: | Input JSON | ms/iter | vs HEAD | vs V8 | | ---------- | ------- | ------- | ----- | | apache_builds.json | 0.44 | 61.06% | 136.86% | | canada.json | 31.04 | 96.54% | 187.15% | | citm_catalog.json | 6.43 | 64.44% | 93.94% | | github_events.json | 0.23 | 59.02% | 128.86% | | google_maps_api_compact_response.json | 0.10 | 82.12% | 133.83% | | google_maps_api_response.json | 0.12 | 68.79% | 140.07% | | gsoc-2018.json | 9.25 | 44.89% | 147.43% | | instruments.json | 1.08 | 70.18% | 167.38% | | marine_ik.json | 21.07 | 88.25% | 142.58% | | mesh.json | 4.51 | 94.56% | 136.57% | | mesh.pretty.json | 9.97 | 83.76% | 193.79% | | numbers.json | 0.57 | 91.88% | 83.37% | | random.json | 3.79 | 78.32% | 107.18% | | repeat.json | 0.06 | 71.51% | 118.47% | | semanticscholar-corpus.json | 37.65 | 54.82% | 57.81% | | tree-pretty.json | 0.17 | 68.68% | 162.33% | | twitter_api_compact_response.json | 0.06 | 75.23% | 126.11% | | twitter_api_response.json | 0.08 | 70.64% | 123.60% | | twitterescaped.json | 3.88 | 84.66% | 177.94% | | twitter.json | 3.54 | 73.01% | 105.33% | | twitter_timeline.json | 0.37 | 81.52% | 271.37% | | update-center.json | 2.85 | 66.75% | 89.01% | vs HEAD (geomean): 72.94% vs V8 (geomean): 130.99% HEAD vs V8 (geomean): 179.59% Issue #55522 TEST=covered by co19 Change-Id: Id673118c19250ab7781cc98c7656b972debc60ff Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/365803 Reviewed-by: Alexander Markov <[email protected]> Commit-Queue: Slava Egorov <[email protected]>

rakudrama · 2024-07-30T02:35:20Z

The current decoder generates maps that take a lot of memory.
Decoding

[{"x":1,"y":2},{"x":3,"y":4}]

creates fresh strings for the repeated "x" (and "y"), and these strings are inserted into similar Maps that have their own index.

There is a cost to using a fresh string as a key. Other than memory footprint, the hashCode needs to be computed for each instance. Perhaps this cost could be better used to identify a common index to be shared by similar JSON objects (marked as copy-on-write if used more than once).

lrhn · 2024-07-30T07:07:35Z

Maps and strings can probably be optimized.
Large JSON inputs trendy to have repetitions of the same map structure.
Canonicalized map keys, maybe even some short values, can avoid doing as many allocations, and if it's possible to reuse the map structure, then there might be a saving there too.
The cost of that is an overhead while using, for recognizing those existing values and structures. I have tried it before, and the overhead was visible.

It's much more efficient to build a custom parser for the known structures using a more pull-based JSON interpreter, than to try to recognize possibly repeated structures at parse time in a generic parser.

(That is: if you care about performance, why are you building maps and lists at all?)

mraleph · 2024-07-30T08:18:52Z

The current decoder generates maps that take a lot of memory.

I have a prototype patch which canonicalizes keys when decoding JSON (this helps both with memory consumption and improves access speed as well, because accessing maps with literal keys can short-circuit string comparison later). You still need to compute hashCode though.

I entertained the idea of using "predictive" decoding, where decoding an array of JSON objects makes an assumption about repeated shape and uses that to short-circuit both key decoding and Map hashing and maybe even use specialized COW Map implementation which allows sharing of the "index" part. This is what V8 JSON decoder effectively does. But I am not sure I want to go there - I feel the need to keep JSON decoder complexity down. My thinking here is aligned with @lrhn's: if code is doing bytes -> JSON objects (Map's) -> proper Dart objects then it is better to actually do bytes -> proper Dart objects.

gnprice · 2024-07-30T16:27:11Z

A pull-based JSON interpreter for going straight from bytes to proper Dart objects sounds great. Previous issue threads for that:

JSON: support custom encode/decode without intermediate objects #35693
Look into optimized reading JSON for known classes by skipping intermediate map/list/etc google/json_serializable.dart#301

kevmoo · 2024-07-30T18:54:32Z

See also https://github.com/kevmoo/json_alt – although this code is VERY old

schultek · 2024-07-31T07:59:49Z

See also https://github.com/simc/crimson

feinstein · 2024-07-31T08:19:20Z

Maybe we should look also at how other languages have dealt with this? I feel that if we only look at how js and dart solutions have being built so far, we might miss some clever solutions that might exist for other languages.

Instead of gradually adding key-value pairs into the Map as JSON is being parsed collect all key values first and then allocate the map with appropriate capacity. Issue #55522 TEST=ci Change-Id: Ic3032323143c35c38469d4f5f289daabdf56ecc9 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/374564 Commit-Queue: Slava Egorov <[email protected]> Reviewed-by: Lasse Nielsen <[email protected]>

AdamTechAcc · 2025-01-28T22:27:43Z

See also https://github.com/simc/crimson

Someone uses it?

mraleph added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. type-performance Issue relates to performance or code size labels Apr 19, 2024

mraleph self-assigned this Apr 19, 2024

a-siva added the triaged Issue has been triaged by sub team label Apr 24, 2024

lrhn removed the type-performance Issue relates to performance or code size label Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve VM's JSON decoder #55522

Improve VM's JSON decoder #55522

mraleph commented Apr 19, 2024

modulovalue commented Apr 27, 2024

lrhn commented Apr 27, 2024

modulovalue commented Apr 28, 2024

lrhn commented Apr 28, 2024 •

edited

Loading

rakudrama commented Jul 30, 2024

lrhn commented Jul 30, 2024

mraleph commented Jul 30, 2024

gnprice commented Jul 30, 2024

kevmoo commented Jul 30, 2024

schultek commented Jul 31, 2024

feinstein commented Jul 31, 2024

AdamTechAcc commented Jan 28, 2025

Improve VM's JSON decoder #55522

Improve VM's JSON decoder #55522

Comments

mraleph commented Apr 19, 2024

modulovalue commented Apr 27, 2024

lrhn commented Apr 27, 2024

modulovalue commented Apr 28, 2024

lrhn commented Apr 28, 2024 • edited Loading

rakudrama commented Jul 30, 2024

lrhn commented Jul 30, 2024

mraleph commented Jul 30, 2024

gnprice commented Jul 30, 2024

kevmoo commented Jul 30, 2024

schultek commented Jul 31, 2024

feinstein commented Jul 31, 2024

AdamTechAcc commented Jan 28, 2025

lrhn commented Apr 28, 2024 •

edited

Loading