Skip to content

Improve VM's JSON decoder  #55522

@mraleph

Description

@mraleph

We sometimes hear complaints that our JSON decoder is slow compared to, for example, V8's. Usually we just shrug these complaints away saying that:

  • Dart's JSON decoder is likely fast enough
  • Dart's JSON decoder is written in Dart and V8's is written in C++ - so some degree of performance difference is expected.

Developers are then encouraged to use high performance native JSON decoders if JSON decoding is a bottleneck for them.

This does not square very well with my belief that Dart should be considered a general purpose high-level programming language. So I would like to take a closer look at JSON's decoder performance and see if it can be considerably improved.

I have taken JSON files from https://github.com/simdjson/simdjson-data/tree/master/jsonexamples and run them through JS and Dart benchmarks.

Dart benchmark is measuring performance of JSON decoder fused with UTF8 decoder:

  final decoder = Utf8Decoder().fuse(JsonDecoder());
  // ...
    for (var i = 0; i < N; i++) {
      decoder.convert(bytes);
    }

JS benchmark is measuring performance of decoding JSON from pre-decoded JS string as well as performance of coverting UTF8 bytes to a JS string:

function benchmark(baseDir, fileName) {
  let content = read(/* ... */);
  // ...
    for (var i = 0; i < N; i++) {
      JSON.parse(content);
    }
  // ...
}

function benchmarkWithUtf8Decoding(baseDir, fileName) {
  let content = readbuffer(/* ... */);
    // ...
    for (var i = 0; i < N; i++) {
      JSON.parse(decodeUtf8(content));
    }
    // ...
}

decodeUtf8 is a helper function which I added to d8 shell which effectively does the following: String::NewFromUtf8(isolate, buffer, NewStringType::kNormal, buffer_size).

This was done to measure the cost of utf8 -> utf16 convertion which will otherwise be hidden.

The baseline measurement for V8 on my desktop looks like this:

"apache_builds.json",string,0.30905068359375
"apache_builds.json",buffer,0.3241399414062499
"canada.json",string,14.897924999999987
"canada.json",buffer,16.510981250000007
"citm_catalog.json",string,3.759248437500003
"citm_catalog.json",buffer,6.9653749999999945
"github_events.json",string,0.14244892578124996
"github_events.json",buffer,0.18018886718749982
"google_maps_api_compact_response.json",string,0.07089770507812503
"google_maps_api_compact_response.json",buffer,0.07248544921875003
"google_maps_api_response.json",string,0.08099453125
"google_maps_api_response.json",buffer,0.08426477050781252
"gsoc-2018.json",string,4.6573312500000155
"gsoc-2018.json",buffer,6.232159375000014
"instruments.json",string,0.49713339843750076
"instruments.json",buffer,0.635085742187502
"marine_ik.json",string,13.519856249999975
"marine_ik.json",buffer,14.717656249999981
"mesh.json",string,2.9209296875000064
"mesh.json",buffer,3.354901562500004
"mesh.pretty.json",string,4.357474999999999
"mesh.pretty.json",buffer,5.1385874999999945
"numbers.json",string,0.5762896484374977
"numbers.json",buffer,0.6817013671874974
"random.json",string,2.822692968749993
"random.json",buffer,3.569248437499982
"repeat.json",string,0.025887060546875063
"repeat.json",buffer,0.051292358398437446
"semanticscholar-corpus.json",string,50.387549999999464
"semanticscholar-corpus.json",buffer,55.51836249999978
"tree-pretty.json",string,0.09902978515625023
"tree-pretty.json",buffer,0.10347050781250004
"twitter_api_compact_response.json",string,0.03364210205078137
"twitter_api_compact_response.json",buffer,0.04489652099609387
"twitter_api_response.json",string,0.04411997070312523
"twitter_api_response.json",buffer,0.06057795410156288
"twitterescaped.json",string,1.7861546875000158
"twitterescaped.json",buffer,2.1810593749999954
"twitter.json",string,1.7194124999999985
"twitter.json",buffer,3.385740625000017
"twitter_timeline.json",string,0.13089736328124957
"twitter_timeline.json",buffer,0.13675043945312382
"update-center.json",string,2.1542187499999956
"update-center.json",buffer,3.237673437500007

Dart in AOT mode yields:

"apache_builds.json",0.6998046875
"canada.json",31.8375
"citm_catalog.json",9.1125
"github_events.json",0.39033203125
"google_maps_api_compact_response.json",0.115283203125
"google_maps_api_response.json",0.162744140625
"gsoc-2018.json",20.49375
"instruments.json",1.45390625
"marine_ik.json",23.26875
"mesh.json",4.640625
"mesh.pretty.json",11.571875
"numbers.json",0.570703125
"random.json",4.7703125
"repeat.json",0.0804443359375
"semanticscholar-corpus.json",69.275
"tree-pretty.json",0.22900390625
"twitter_api_compact_response.json",0.0744384765625
"twitter_api_response.json",0.1037109375
"twitterescaped.json",4.553125
"twitter.json",4.6828125
"twitter_timeline.json",0.45234375
"update-center.json",4.2859375

With the exceptions of numbers.json Dart's decoder is 1.5-3x slower.

Good news is that it is not impossible to significantly improve its performance - I have done some experiments and got to the following preliminary numbers:

"apache_builds.json",0.3947265625
"canada.json",30.7375
"citm_catalog.json",5.0671875
"github_events.json",0.22177734375
"google_maps_api_compact_response.json",0.093896484375
"google_maps_api_response.json",0.101416015625
"gsoc-2018.json",8.790625
"instruments.json",0.956640625
"marine_ik.json",18.00625
"mesh.json",3.984375
"mesh.pretty.json",8.625
"numbers.json",0.49609375
"random.json",3.4328125
"repeat.json",0.04671630859375
"semanticscholar-corpus.json",36.4875
"tree-pretty.json",0.14794921875
"twitter_api_compact_response.json",0.056640625
"twitter_api_response.json",0.071142578125
"twitterescaped.json",3.8953125
"twitter.json",3.1515625
"twitter_timeline.json",0.37626953125
"update-center.json",2.89453125

Which are significantly better.

I am going to use this issue as an umbrella issue to land my changes.

Metadata

Metadata

Assignees

Labels

area-vmUse area-vm for VM related issues, including code coverage, and the AOT and JIT backends.triagedIssue has been triaged by sub team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions