Skip to content

6097: Migrate from default Python serialization to orjson#6115

Closed
navaro1 wants to merge 10 commits intoquantumlib:mainfrom
navaro1:6097/piotr/serialization_improvements
Closed

6097: Migrate from default Python serialization to orjson#6115
navaro1 wants to merge 10 commits intoquantumlib:mainfrom
navaro1:6097/piotr/serialization_improvements

Conversation

@navaro1
Copy link
Copy Markdown
Contributor

@navaro1 navaro1 commented Jun 1, 2023

  1. The performance improvement is right now about ~8 times better (for 1000 qubits and 100 num_moments from 1.15±0.01s to 147±1ms). This puts us in the ballpark figure mentioned here: Add benchmarks for transformer primitives and json serialization #5957 (comment)
  2. orjson only supports none indentation or indentation of size 2 - source
  3. orjson does not support handling separators: Optional[Tuple[str, str]] - source

Performance differences (will be updated with consequent changes)

Before change (run with indent=None, default json library)

[100.00%] ··· serialization.SerializeLargeExpandedCircuits.time_json_serialization                                                                                                                                           ok
[100.00%] ··· ============ ============ ============
              --                  num_moments       
              ------------ -------------------------
               num_qubits      100          200     
              ============ ============ ============
                  100        114±2ms      231±4ms   
                  1000      1.15±0.01s   2.30±0.01s 
              ============ ============ ============

After changes (orjson library)

[ 37.50%] ··· ============ ============= ============= ============== ============ =============
              --                                             indent / contextual_serialization                    
              -------------------------- -------------------------------------------------------
               num_qubits   num_moments   None / True   None / False    2 / True     2 / False  
              ============ ============= ============= ============== ============ =============
                  100           100        50.4±0.6ms    14.4±0.1ms    51.2±0.5ms   14.7±0.08ms 
                  100           1000        515±1ms      145±0.9ms      521±1ms       155±1ms   
                  100           4000       2.06±0.01s     597±4ms       2.11±0s       639±5ms   
                  500           100         263±2ms      74.7±0.3ms    258±0.8ms     75.6±0.3ms 
                  500           1000       2.58±0.01s     757±2ms      2.61±0.01s     793±4ms   
                  500           4000       10.3±0.01s    3.04±0.05s     10.6±0s       3.17±0s   
                  1000          100         514±2ms       147±1ms       531±6ms       157±2ms   
                  1000          1000       5.21±0.02s    1.53±0.01s     5.29±0s      1.60±0.01s 
                  1000          4000       21.1±0.1s     6.14±0.04s    21.5±0.06s    6.44±0.04s 
              ============ ============= ============= ============== ============ =============

@CirqBot CirqBot added the size: S 10< lines changed <50 label Jun 1, 2023
@navaro1 navaro1 changed the title 6097: Exploratory - Migrate from default Python serialization to orjson 6097: Migrate from default Python serialization to orjson Jun 3, 2023
@navaro1 navaro1 marked this pull request as ready for review June 3, 2023 09:21
@navaro1 navaro1 requested review from a team, cduck and vtomole as code owners June 3, 2023 09:21
@navaro1 navaro1 requested a review from viathor June 3, 2023 09:21
@navaro1 navaro1 requested review from verult and wcourtney as code owners June 3, 2023 09:34
@navaro1 navaro1 force-pushed the 6097/piotr/serialization_improvements branch 2 times, most recently from 5995ab9 to 36e9af7 Compare June 3, 2023 09:41
@pavoljuhas
Copy link
Copy Markdown
Collaborator

Is the new output compatible with serialization produced with builtin json?

Ie, after this PR will the serialization before and after orjson stay compatible?

@tanujkhattar
Copy link
Copy Markdown
Collaborator

From Cirq sync:

Doug: We are adding a new dependency to Cirq. Is orjson well maintained and have good compliance / review process? Especially since we are talking about binary formats.

@rht
Copy link
Copy Markdown

rht commented Jun 8, 2023

Is the new output compatible with serialization produced with builtin json?

Based on the list of the features and drawbacks in https://github.com/ijl/orjson#orjson, it is stricter than the stdlib json, in conforming to the JSON spec.

Is orjson well maintained and have good compliance / review process?

I have seen that orjson has been used in Zulip for a while. The discussion comparing the alternatives can be found at zulip/zulip#6507.

@navaro1
Copy link
Copy Markdown
Contributor Author

navaro1 commented Jun 8, 2023

Is the new output compatible with serialization produced with builtin json?

Almost always yes. read methods were not touched, and the tests are passing. The serialization output is not compatible when indendation is other than None or 2. orjson does not support other indendation levels.

@navaro1 navaro1 force-pushed the 6097/piotr/serialization_improvements branch from 36e9af7 to 70a066f Compare June 13, 2023 02:13
navaro1 added 5 commits June 14, 2023 04:16
This commit is part of an exploratory migration from the default Python
serialization system to the 'orjson' library. Changes include the addition of
the 'orjson' import in the json_serialization.py module and subsequent use of
the 'orjson' functions. In addition, 'orjson' has been added to the
cirq-core/requirements.txt file.
The commit introduces the following changes:

1. An optional parameter 'enable_contextual_serialization' is added to 'assert_json_roundtrip_works' function in 'cirq/testing/json.py' and 'to_json' function in 'cirq/protocols/json_serialization.py'. This flag enables concise serialization of objects using a context map to prevent re-serialization of identical objects.
2. The 'to_json' and 'to_json_bytes' functions in 'cirq/protocols/json_serialization.py' now have an 'indent' parameter. The indentation is set to 2 spaces when the parameter is not None, improving the readability of the output JSON.
3. Updated the function 'assert_json_roundtrip_works' in 'cirq/testing/json.py' to include the new 'enable_contextual_serialization' parameter and the 'indent' parameter in the 'cirq.protocols.to_json' call.
4. Refactored tests in 'cirq/protocols/json_serialization_test.py' to use the new 'enable_contextual_serialization' and 'indent' parameters.
5. Deprecated 'separators' parameter in 'to_json' function of 'cirq/protocols/json_serialization.py' as it has no effect.

This enhancement improves the performance of JSON serialization by avoiding redundant serialization and increases readability by allowing pretty printing.
- Modified the default values of `indent` and `enable_contextual_serialization` in `to_json` method. By default, pretty-printing is now enabled and contextual serialization is now turned on. This will provide more consistency with current implementation.
- Removed the `enable_contextual_serialization` parameter from the `assert_json_roundtrip_works` function, and updated all its occurrences. This refactoring simplifies the function signature and usage.
- Updated `to_json` docstring to better explain the `indent` and `enable_contextual_serialization` parameters and their impact on performance. Also, clarified the return type of the function.
- Updated the tests to reflect these changes. Specifically, removed the `enable_contextual_serialization` parameter in `assert_json_roundtrip_works` calls, and the indentation in `to_json` calls where default value is now used.
1. Updated default values and added new parameters for `to_json` and `to_json_gzip` methods.
2. Updated the serialization benchmark `SerializeLargeExpandedCircuits` to consider the impact of the `indent` and `enable_contextual_serialization` configurations.
@navaro1 navaro1 force-pushed the 6097/piotr/serialization_improvements branch from 70a066f to ed0d4d1 Compare June 14, 2023 02:16
@codecov
Copy link
Copy Markdown

codecov bot commented Aug 5, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.84%. Comparing base (26dbabc) to head (62848b4).
Report is 250 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6115   +/-   ##
=======================================
  Coverage   97.84%   97.84%           
=======================================
  Files        1110     1110           
  Lines       96696    96709   +13     
=======================================
+ Hits        94612    94625   +13     
  Misses       2084     2084           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@tanujkhattar
Copy link
Copy Markdown
Collaborator

xref #6315

We now have a use case where serialization performance is becoming a bottleneck. @suyashdamle to check into whether adding the additional dependency to orjson is compliant with Google policy.

@github-actions github-actions bot added Stale and removed Stale labels Feb 13, 2024
@github-actions github-actions bot added the Stale label Mar 18, 2024
@github-actions github-actions bot closed this Apr 18, 2024
@mhucka mhucka added the status/stale Closed due to inactivity for an extended period of time label Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size: S 10< lines changed <50 status/stale Closed due to inactivity for an extended period of time

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants