Skip to content

Support >2G model export - alternative implementation | torchlib(feat) #1004

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from

Conversation

justinchuby
Copy link
Collaborator

@justinchuby justinchuby commented Aug 11, 2023

Support >2G model export - alternative implementation where we build initializers ourselves. This is a follow up of #1003. We now build the TensorProto ourselves directly from PyTorch tensors. This circumvents torchscript _export_onnx's limitation of 2G protobuf serialization and additional serialization, because we are now keeping everything in memory.

2G model is now no longer a special case because we add initializers in a separate step.

TODO

  • Test to_model_proto with initlizers for all torch data types.

@justinchuby justinchuby requested a review from BowenBao August 11, 2023 04:11
@justinchuby
Copy link
Collaborator Author

@BowenBao Speed seems to be reasonably good. There's errors: Unsuported type proto value case.

@codecov
Copy link

codecov bot commented Aug 11, 2023

Codecov Report

Attention: Patch coverage is 60.00000% with 6 lines in your changes missing coverage. Please review.

Project coverage is 78.65%. Comparing base (5a8f4e3) to head (1aa61ac).
Report is 254 commits behind head on main.

Files Patch % Lines
...nxscript/function_libs/torch_lib/graph_building.py 60.00% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1004      +/-   ##
==========================================
+ Coverage   78.64%   78.65%   +0.01%     
==========================================
  Files         118      118              
  Lines       15441    15435       -6     
  Branches     2424     2422       -2     
==========================================
- Hits        12144    12141       -3     
+ Misses       2899     2896       -3     
  Partials      398      398              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions
Copy link

github-actions bot commented Aug 11, 2023

Test Results

         24 files  ±0         24 suites  ±0   1h 42m 10s ⏱️ + 5m 11s
  11 254 tests ±0    8 378 ✔️ ±0      2 876 💤 ±0  0 ±0 
255 750 runs  ±0  57 240 ✔️ ±0  198 510 💤 ±0  0 ±0 

Results for commit 1aa61ac. ± Comparison against base commit 5a8f4e3.

♻️ This comment has been updated with latest results.

@BowenBao
Copy link
Contributor

Adding back infer_shapes and check_model made tests pass locally.

@justinchuby justinchuby marked this pull request as draft August 11, 2023 17:07
@justinchuby
Copy link
Collaborator Author

Let me test again

@justinchuby
Copy link
Collaborator Author

justinchuby commented Aug 11, 2023

Adding back infer_shapes and check_model made tests pass locally.

Nice, thanks! I updated this PR for review

@justinchuby justinchuby marked this pull request as ready for review August 11, 2023 17:19
Copy link
Contributor

@BowenBao BowenBao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@justinchuby justinchuby added module: torchlib Related to the torch/aten function lib in development hold on merging Don't merge yet labels Aug 11, 2023
@justinchuby
Copy link
Collaborator Author

Needs validation with real models

@justinchuby justinchuby marked this pull request as draft August 11, 2023 17:38
@titaiwangms titaiwangms self-requested a review August 21, 2023 16:16
@justinchuby
Copy link
Collaborator Author

@BowenBao should we move this to under a flag?

)
tensor_proto = onnx.helper.make_tensor(
name=name,
data_type=_type_utils.JitScalarType.from_dtype(tensor.dtype).onnx_type(),

Check failure

Code scanning / lintrunner

PYLINT/E0602

Undefined variable '_type_utils' (undefined-variable) See [undefined-variable](https://pylint.pycqa.org/en/latest/user_guide/messages/error/undefined-variable.html). To disable, use ` # pylint: disable=undefined-variable`
)
tensor_proto = onnx.helper.make_tensor(
name=name,
data_type=_type_utils.JitScalarType.from_dtype(tensor.dtype).onnx_type(),

Check failure

Code scanning / lintrunner

MYPY/name-defined

Name "_type_utils" is not defined To disable, use ` # type: ignore[name-defined]`
)
tensor_proto = onnx.helper.make_tensor(
name=name,
data_type=_type_utils.JitScalarType.from_dtype(tensor.dtype).onnx_type(),

Check failure

Code scanning / lintrunner

RUFF/F821

Undefined name `_type_utils`. See https://beta.ruff.rs/docs/rules/
@justinchuby justinchuby marked this pull request as ready for review December 6, 2023 00:39
@@ -378,12 +374,38 @@ def eval_function( # type: ignore[override]
return self._graph.add_function_call(function, inputs, attributes)


def _add_initializers(
Copy link
Contributor

@BowenBao BowenBao Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi @kunal-vaishnavi creating onnx tensorproto from torch tensor dataptr

@justinchuby justinchuby marked this pull request as draft May 20, 2024 06:19
@justinchuby justinchuby closed this Aug 8, 2024
@justinchuby justinchuby deleted the justinchu/big-models-comp branch January 27, 2025 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hold on merging Don't merge yet module: torchlib Related to the torch/aten function lib in development
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants