Skip to content

Conversation

justinchuby
Copy link
Collaborator

@justinchuby justinchuby commented Aug 11, 2023

Support >2G model export - alternative implementation where we build initializers ourselves. This is a follow up of #1003. We now build the TensorProto ourselves directly from PyTorch tensors. This circumvents torchscript _export_onnx's limitation of 2G protobuf serialization and additional serialization, because we are now keeping everything in memory.

2G model is now no longer a special case because we add initializers in a separate step.

TODO

  • Test to_model_proto with initlizers for all torch data types.

@justinchuby justinchuby requested a review from BowenBao August 11, 2023 04:11
@justinchuby
Copy link
Collaborator Author

@BowenBao Speed seems to be reasonably good. There's errors: Unsuported type proto value case.

@codecov
Copy link

codecov bot commented Aug 11, 2023

Codecov Report

Attention: Patch coverage is 60.00000% with 6 lines in your changes missing coverage. Please review.

Project coverage is 78.65%. Comparing base (5a8f4e3) to head (1aa61ac).
Report is 254 commits behind head on main.

Files Patch % Lines
...nxscript/function_libs/torch_lib/graph_building.py 60.00% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1004      +/-   ##
==========================================
+ Coverage   78.64%   78.65%   +0.01%     
==========================================
  Files         118      118              
  Lines       15441    15435       -6     
  Branches     2424     2422       -2     
==========================================
- Hits        12144    12141       -3     
+ Misses       2899     2896       -3     
  Partials      398      398              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions
Copy link

github-actions bot commented Aug 11, 2023

Test Results

         24 files  ±0         24 suites  ±0   1h 42m 10s ⏱️ + 5m 11s
  11 254 tests ±0    8 378 ✔️ ±0      2 876 💤 ±0  0 ±0 
255 750 runs  ±0  57 240 ✔️ ±0  198 510 💤 ±0  0 ±0 

Results for commit 1aa61ac. ± Comparison against base commit 5a8f4e3.

♻️ This comment has been updated with latest results.

@BowenBao
Copy link
Contributor

Adding back infer_shapes and check_model made tests pass locally.

@justinchuby justinchuby marked this pull request as draft August 11, 2023 17:07
@justinchuby
Copy link
Collaborator Author

Let me test again

@justinchuby
Copy link
Collaborator Author

justinchuby commented Aug 11, 2023

Adding back infer_shapes and check_model made tests pass locally.

Nice, thanks! I updated this PR for review

@justinchuby justinchuby marked this pull request as ready for review August 11, 2023 17:19
Copy link
Contributor

@BowenBao BowenBao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@justinchuby justinchuby added module: torchlib Related to the torch/aten function lib in development hold on merging labels Aug 11, 2023
@justinchuby
Copy link
Collaborator Author

Needs validation with real models

@justinchuby justinchuby marked this pull request as draft August 11, 2023 17:38
@titaiwangms titaiwangms self-requested a review August 21, 2023 16:16
@justinchuby
Copy link
Collaborator Author

@BowenBao should we move this to under a flag?

)
tensor_proto = onnx.helper.make_tensor(
name=name,
data_type=_type_utils.JitScalarType.from_dtype(tensor.dtype).onnx_type(),

Check failure

Code scanning / lintrunner

PYLINT/E0602

Undefined variable '_type_utils' (undefined-variable) See [undefined-variable](https://pylint.pycqa.org/en/latest/user_guide/messages/error/undefined-variable.html). To disable, use ` # pylint: disable=undefined-variable`
)
tensor_proto = onnx.helper.make_tensor(
name=name,
data_type=_type_utils.JitScalarType.from_dtype(tensor.dtype).onnx_type(),

Check failure

Code scanning / lintrunner

MYPY/name-defined

Name "_type_utils" is not defined To disable, use ` # type: ignore[name-defined]`
)
tensor_proto = onnx.helper.make_tensor(
name=name,
data_type=_type_utils.JitScalarType.from_dtype(tensor.dtype).onnx_type(),

Check failure

Code scanning / lintrunner

RUFF/F821

Undefined name `_type_utils`. See https://beta.ruff.rs/docs/rules/
@justinchuby justinchuby marked this pull request as ready for review December 6, 2023 00:39
return self._graph.add_function_call(function, inputs, attributes)


def _add_initializers(
Copy link
Contributor

@BowenBao BowenBao Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi @kunal-vaishnavi creating onnx tensorproto from torch tensor dataptr

@justinchuby justinchuby marked this pull request as draft May 20, 2024 06:19
@justinchuby justinchuby closed this Aug 8, 2024
@justinchuby justinchuby deleted the justinchu/big-models-comp branch January 27, 2025 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: torchlib Related to the torch/aten function lib in development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants