Skip to content

Improve VM's JSON decoder #55522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mraleph opened this issue Apr 19, 2024 · 12 comments
Open

Improve VM's JSON decoder #55522

mraleph opened this issue Apr 19, 2024 · 12 comments
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. triaged Issue has been triaged by sub team

Comments

@mraleph
Copy link
Member

mraleph commented Apr 19, 2024

We sometimes hear complaints that our JSON decoder is slow compared to, for example, V8's. Usually we just shrug these complaints away saying that:

  • Dart's JSON decoder is likely fast enough
  • Dart's JSON decoder is written in Dart and V8's is written in C++ - so some degree of performance difference is expected.

Developers are then encouraged to use high performance native JSON decoders if JSON decoding is a bottleneck for them.

This does not square very well with my belief that Dart should be considered a general purpose high-level programming language. So I would like to take a closer look at JSON's decoder performance and see if it can be considerably improved.

I have taken JSON files from https://github.com/simdjson/simdjson-data/tree/master/jsonexamples and run them through JS and Dart benchmarks.

Dart benchmark is measuring performance of JSON decoder fused with UTF8 decoder:

  final decoder = Utf8Decoder().fuse(JsonDecoder());
  // ...
    for (var i = 0; i < N; i++) {
      decoder.convert(bytes);
    }

JS benchmark is measuring performance of decoding JSON from pre-decoded JS string as well as performance of coverting UTF8 bytes to a JS string:

function benchmark(baseDir, fileName) {
  let content = read(/* ... */);
  // ...
    for (var i = 0; i < N; i++) {
      JSON.parse(content);
    }
  // ...
}

function benchmarkWithUtf8Decoding(baseDir, fileName) {
  let content = readbuffer(/* ... */);
    // ...
    for (var i = 0; i < N; i++) {
      JSON.parse(decodeUtf8(content));
    }
    // ...
}

decodeUtf8 is a helper function which I added to d8 shell which effectively does the following: String::NewFromUtf8(isolate, buffer, NewStringType::kNormal, buffer_size).

This was done to measure the cost of utf8 -> utf16 convertion which will otherwise be hidden.

The baseline measurement for V8 on my desktop looks like this:

"apache_builds.json",string,0.30905068359375
"apache_builds.json",buffer,0.3241399414062499
"canada.json",string,14.897924999999987
"canada.json",buffer,16.510981250000007
"citm_catalog.json",string,3.759248437500003
"citm_catalog.json",buffer,6.9653749999999945
"github_events.json",string,0.14244892578124996
"github_events.json",buffer,0.18018886718749982
"google_maps_api_compact_response.json",string,0.07089770507812503
"google_maps_api_compact_response.json",buffer,0.07248544921875003
"google_maps_api_response.json",string,0.08099453125
"google_maps_api_response.json",buffer,0.08426477050781252
"gsoc-2018.json",string,4.6573312500000155
"gsoc-2018.json",buffer,6.232159375000014
"instruments.json",string,0.49713339843750076
"instruments.json",buffer,0.635085742187502
"marine_ik.json",string,13.519856249999975
"marine_ik.json",buffer,14.717656249999981
"mesh.json",string,2.9209296875000064
"mesh.json",buffer,3.354901562500004
"mesh.pretty.json",string,4.357474999999999
"mesh.pretty.json",buffer,5.1385874999999945
"numbers.json",string,0.5762896484374977
"numbers.json",buffer,0.6817013671874974
"random.json",string,2.822692968749993
"random.json",buffer,3.569248437499982
"repeat.json",string,0.025887060546875063
"repeat.json",buffer,0.051292358398437446
"semanticscholar-corpus.json",string,50.387549999999464
"semanticscholar-corpus.json",buffer,55.51836249999978
"tree-pretty.json",string,0.09902978515625023
"tree-pretty.json",buffer,0.10347050781250004
"twitter_api_compact_response.json",string,0.03364210205078137
"twitter_api_compact_response.json",buffer,0.04489652099609387
"twitter_api_response.json",string,0.04411997070312523
"twitter_api_response.json",buffer,0.06057795410156288
"twitterescaped.json",string,1.7861546875000158
"twitterescaped.json",buffer,2.1810593749999954
"twitter.json",string,1.7194124999999985
"twitter.json",buffer,3.385740625000017
"twitter_timeline.json",string,0.13089736328124957
"twitter_timeline.json",buffer,0.13675043945312382
"update-center.json",string,2.1542187499999956
"update-center.json",buffer,3.237673437500007

Dart in AOT mode yields:

"apache_builds.json",0.6998046875
"canada.json",31.8375
"citm_catalog.json",9.1125
"github_events.json",0.39033203125
"google_maps_api_compact_response.json",0.115283203125
"google_maps_api_response.json",0.162744140625
"gsoc-2018.json",20.49375
"instruments.json",1.45390625
"marine_ik.json",23.26875
"mesh.json",4.640625
"mesh.pretty.json",11.571875
"numbers.json",0.570703125
"random.json",4.7703125
"repeat.json",0.0804443359375
"semanticscholar-corpus.json",69.275
"tree-pretty.json",0.22900390625
"twitter_api_compact_response.json",0.0744384765625
"twitter_api_response.json",0.1037109375
"twitterescaped.json",4.553125
"twitter.json",4.6828125
"twitter_timeline.json",0.45234375
"update-center.json",4.2859375

With the exceptions of numbers.json Dart's decoder is 1.5-3x slower.

Good news is that it is not impossible to significantly improve its performance - I have done some experiments and got to the following preliminary numbers:

"apache_builds.json",0.3947265625
"canada.json",30.7375
"citm_catalog.json",5.0671875
"github_events.json",0.22177734375
"google_maps_api_compact_response.json",0.093896484375
"google_maps_api_response.json",0.101416015625
"gsoc-2018.json",8.790625
"instruments.json",0.956640625
"marine_ik.json",18.00625
"mesh.json",3.984375
"mesh.pretty.json",8.625
"numbers.json",0.49609375
"random.json",3.4328125
"repeat.json",0.04671630859375
"semanticscholar-corpus.json",36.4875
"tree-pretty.json",0.14794921875
"twitter_api_compact_response.json",0.056640625
"twitter_api_response.json",0.071142578125
"twitterescaped.json",3.8953125
"twitter.json",3.1515625
"twitter_timeline.json",0.37626953125
"update-center.json",2.89453125

Which are significantly better.

I am going to use this issue as an umbrella issue to land my changes.

@mraleph mraleph added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. type-performance Issue relates to performance or code size labels Apr 19, 2024
@mraleph mraleph self-assigned this Apr 19, 2024
copybara-service bot pushed a commit that referenced this issue Apr 24, 2024
This pragma forces compiler to align loop headers within the
function by architecture specific boundary: 32 bytes on X64
and ARM64 (with the exception of Apple Silicon, which explicitly
discourages aligning branch targets in the optimization manual).

Current implementation is rather naive and does not do any
attempt to decide whether aligning is actually profitable
based on loop body itself.

I have found this pragma to be helpful both to stabilize
benchmark results and achieve better performance
for tight loops on Intel hardware.

Issue #55522

TEST=vm/dart/align_loops_test

Cq-Include-Trybots: luci.dart.try:vm-aot-linux-product-x64-try,vm-aot-linux-debug-x64c-try,vm-aot-linux-debug-x64-try,vm-aot-linux-release-arm64-try,vm-aot-linux-release-simarm_x64-try,vm-aot-linux-release-x64-try,vm-aot-mac-product-arm64-try,vm-aot-mac-release-arm64-try,vm-aot-mac-release-x64-try,vm-aot-obfuscate-linux-release-x64-try,vm-aot-optimization-level-linux-release-x64-try,vm-aot-win-release-x64-try,vm-aot-win-debug-arm64-try,vm-aot-win-debug-x64-try,vm-aot-win-debug-x64c-try,vm-aot-win-product-x64-try,vm-aot-win-release-arm64-try,vm-aot-linux-debug-simarm_x64-try,vm-aot-linux-debug-simriscv64-try,vm-aot-dwarf-linux-product-x64-try,vm-aot-android-release-arm64c-try,vm-aot-android-release-arm_x64-try
Change-Id: Ic22fb90d85e7fdebeeaa3908a43328c59436ab58
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/364121
Reviewed-by: Alexander Markov <[email protected]>
Commit-Queue: Slava Egorov <[email protected]>
pull bot pushed a commit to AmirulAndalib/sdk that referenced this issue Apr 24, 2024
It was not removing all CheckStackOverflow instructions.

Issue dart-lang#55522

TEST=vm/cc/CheckStackOverflowElimination_NoInterruptsPragma

Change-Id: I1a3db6539951ab4b6450804393c89bc6aafff5b3
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/364324
Reviewed-by: Alexander Markov <[email protected]>
Commit-Queue: Slava Egorov <[email protected]>
@a-siva a-siva added the triaged Issue has been triaged by sub team label Apr 24, 2024
copybara-service bot pushed a commit that referenced this issue Apr 25, 2024
This pragma instructs compiler to remove all bounds checks from the
annotated function. This can be helpful when tuning performance of
hot tight loops where compiler is unable to eliminate bounds check
itself.

For very tight loops I have measured 25-50% overhead from bounds
checks which I think comes from some combination of general code
quality issues due to fixed input registers and increased branch
density.

In future it could be possible to teach our range analysis to
eliminate bounds checks when loop bound is itself bounded by
array length, but for now we can resort to this pragma for
extremely hot library code.

Issue #55522

TEST=vm/cc/BoundsCheckElimination_Pragma
[email protected]

Change-Id: Ia7b1e88a16a2b45fa8593a227a4985568892b29c
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/364500
Reviewed-by: Alexander Markov <[email protected]>
Commit-Queue: Slava Egorov <[email protected]>
@modulovalue
Copy link
Contributor

@mraleph Do you see any potential performance improvements from merging the underlying regular UTF8 automaton with an automaton that lexes JSON and using that as the lexer for the JsonDecoder (to have a Utf8JsonDecoder) so that no fusing is needed?

@lrhn
Copy link
Member

lrhn commented Apr 27, 2024

The fusing creates a UTF-8 specialized JSON decoder, _JsonUtf8Decoder.

@modulovalue
Copy link
Contributor

Thank you for the hint @lrhn. It looks to me like the specialized UTF8-JSON decoder is using a table-driven UTF8 decoder and a non-table-driven JSON parser.

What I meant to say in this comment was, since the language of UTF8 appears to be regular, and the lexical structure of JSON seems to be regular too, it feels like there might be performance gains to be had by merging both automata into one and having the tokenization phase be based on a single automaton that does UTF8 decoding and JSON lexing at the same time.

@lrhn
Copy link
Member

lrhn commented Apr 28, 2024

I think the UTF-8 is a red herring here.
A table based tokenizer would work equally well for both UTF-8 and string based JSON parsing, since everything outside of string values is ASCII anyway, and what's inside strings need to be interpreted if it contains escapes -- or non-ASCII characters in a UTF-8 source.

The reason the current implementation doesn't use a table based tokenizer is that it doesn't need to tokenize at all. The moment it sees the first character of a value, it can start interpreting it. It tries to collect the value of a number and the contents of a string while scanning, in one pass, so it doesn't have to first tokenize, and then interpret the token afterwards.

Maybe that process can be made more efficient, but doing an extra tokenization step first doesn't sound like an improvement.
(Unless it can be done very efficiently using SIMD operations or using parallelism, or something.)

copybara-service bot pushed a commit that referenced this issue May 2, 2024
Use aligned text offset for instructions when resolving static
calls during relocation.

The vm/dart/align_loops_test is adjusted to cover this issue:
previous we only verified alignment but did not actually run
the generated code, so the bug was undetected.

Issue #55522

TEST=vm/dart/align_loops_test
[email protected]

Change-Id: I56521c104f91b85150584539f58191cf4592f4f5
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/365144
Reviewed-by: Alexander Markov <[email protected]>
Commit-Queue: Slava Egorov <[email protected]>
copybara-service bot pushed a commit that referenced this issue May 11, 2024
Previously we would only fuse this pattern for Smi operations
which (almost) never happens in AOT. This CL enables fusion
for Int64 operations as well.

The implementation is limited to X64 and ARM64 for now
because implementing it on 32-bit platforms is somewhat
cumbersome.

Issue #55522

[email protected]
TEST=vm/cc/IL_TestIntInstr,vm/dart/test_int_pattern_il_test

Cq-Include-Trybots: luci.dart.try:vm-linux-release-simarm-try,vm-aot-linux-release-simarm_x64-try,vm-aot-linux-debug-simarm_x64-try,dart-sdk-linux-riscv64-try,vm-aot-linux-debug-simriscv64-try,vm-ffi-qemu-linux-release-riscv64-try,vm-linux-debug-simriscv64-try,vm-linux-release-ia32-try
Change-Id: I62a482640db45befac6b0b78850f23a8cc624c75
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/365463
Reviewed-by: Alexander Markov <[email protected]>
Commit-Queue: Slava Egorov <[email protected]>
copybara-service bot pushed a commit that referenced this issue May 15, 2024
This CL focuses on improving parsing of white space (between
JSON tokens) and simple strings which don't contain escape
sequences inside.

Improvements are achieved by changing the code to
table driven implementation instead of if-cascade: we
have a 256 element table which stores attributes for characters
(e.g. whether it is a white space or a terminal token for
a simple string) and use this table to make decisions
on whether to advance through characters or stop a loop and
do something else.

We also suppress bounds checks and interrupt checks in tight
loops - in tight loops like this a bound checks can cost
30% in overhead.

This CL brings 28% geomean improvement on benchmarks from the linked issue. (All measurements are done in X64 Product AOT)

Individual measurements are:

| Input JSON | ms/iter | vs HEAD | vs V8 |
| ---------- | ------- | ------- | ----- |
| apache_builds.json | 0.44 | 61.06% | 136.86% |
| canada.json | 31.04 | 96.54% | 187.15% |
| citm_catalog.json | 6.43 | 64.44% | 93.94% |
| github_events.json | 0.23 | 59.02% | 128.86% |
| google_maps_api_compact_response.json | 0.10 | 82.12% | 133.83% |
| google_maps_api_response.json | 0.12 | 68.79% | 140.07% |
| gsoc-2018.json | 9.25 | 44.89% | 147.43% |
| instruments.json | 1.08 | 70.18% | 167.38% |
| marine_ik.json | 21.07 | 88.25% | 142.58% |
| mesh.json | 4.51 | 94.56% | 136.57% |
| mesh.pretty.json | 9.97 | 83.76% | 193.79% |
| numbers.json | 0.57 | 91.88% | 83.37% |
| random.json | 3.79 | 78.32% | 107.18% |
| repeat.json | 0.06 | 71.51% | 118.47% |
| semanticscholar-corpus.json | 37.65 | 54.82% | 57.81% |
| tree-pretty.json | 0.17 | 68.68% | 162.33% |
| twitter_api_compact_response.json | 0.06 | 75.23% | 126.11% |
| twitter_api_response.json | 0.08 | 70.64% | 123.60% |
| twitterescaped.json | 3.88 | 84.66% | 177.94% |
| twitter.json | 3.54 | 73.01% | 105.33% |
| twitter_timeline.json | 0.37 | 81.52% | 271.37% |
| update-center.json | 2.85 | 66.75% | 89.01% |

vs HEAD (geomean): 72.94%
vs V8 (geomean): 130.99%
HEAD vs V8 (geomean): 179.59%

Issue #55522

TEST=covered by co19

Change-Id: Id673118c19250ab7781cc98c7656b972debc60ff
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/365803
Reviewed-by: Alexander Markov <[email protected]>
Commit-Queue: Slava Egorov <[email protected]>
@rakudrama
Copy link
Member

The current decoder generates maps that take a lot of memory.
Decoding

[{"x":1,"y":2},{"x":3,"y":4}]

creates fresh strings for the repeated "x" (and "y"), and these strings are inserted into similar Maps that have their own index.

There is a cost to using a fresh string as a key. Other than memory footprint, the hashCode needs to be computed for each instance. Perhaps this cost could be better used to identify a common index to be shared by similar JSON objects (marked as copy-on-write if used more than once).

@lrhn
Copy link
Member

lrhn commented Jul 30, 2024

Maps and strings can probably be optimized.
Large JSON inputs trendy to have repetitions of the same map structure.
Canonicalized map keys, maybe even some short values, can avoid doing as many allocations, and if it's possible to reuse the map structure, then there might be a saving there too.
The cost of that is an overhead while using, for recognizing those existing values and structures. I have tried it before, and the overhead was visible.

It's much more efficient to build a custom parser for the known structures using a more pull-based JSON interpreter, than to try to recognize possibly repeated structures at parse time in a generic parser.

(That is: if you care about performance, why are you building maps and lists at all?)

@mraleph
Copy link
Member Author

mraleph commented Jul 30, 2024

The current decoder generates maps that take a lot of memory.

I have a prototype patch which canonicalizes keys when decoding JSON (this helps both with memory consumption and improves access speed as well, because accessing maps with literal keys can short-circuit string comparison later). You still need to compute hashCode though.

I entertained the idea of using "predictive" decoding, where decoding an array of JSON objects makes an assumption about repeated shape and uses that to short-circuit both key decoding and Map hashing and maybe even use specialized COW Map implementation which allows sharing of the "index" part. This is what V8 JSON decoder effectively does. But I am not sure I want to go there - I feel the need to keep JSON decoder complexity down. My thinking here is aligned with @lrhn's: if code is doing bytes -> JSON objects (Map's) -> proper Dart objects then it is better to actually do bytes -> proper Dart objects.

@gnprice
Copy link
Contributor

gnprice commented Jul 30, 2024

A pull-based JSON interpreter for going straight from bytes to proper Dart objects sounds great. Previous issue threads for that:

@kevmoo
Copy link
Member

kevmoo commented Jul 30, 2024

See also https://github.com/kevmoo/json_alt – although this code is VERY old

@schultek
Copy link

@feinstein
Copy link

Maybe we should look also at how other languages have dealt with this? I feel that if we only look at how js and dart solutions have being built so far, we might miss some clever solutions that might exist for other languages.

copybara-service bot pushed a commit that referenced this issue Aug 13, 2024
Instead of gradually adding key-value pairs into the Map as JSON
is being parsed collect all key values first and then allocate
the map with appropriate capacity.

Issue #55522

TEST=ci

Change-Id: Ic3032323143c35c38469d4f5f289daabdf56ecc9
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/374564
Commit-Queue: Slava Egorov <[email protected]>
Reviewed-by: Lasse Nielsen <[email protected]>
@AdamTechAcc
Copy link

See also https://github.com/simc/crimson

Someone uses it?

@lrhn lrhn removed the type-performance Issue relates to performance or code size label Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. triaged Issue has been triaged by sub team
Projects
None yet
Development

No branches or pull requests

10 participants