Implement save_model and tensor sharding (#39)

justinchuby · Copilot · web-flow · commit 9fc5960dfbef · 2026-01-11T13:45:45.000-08:00
This pull request introduces support for sharding large ONNX model weights into multiple safetensors files, adds a new high-level `save_model` API, and provides comprehensive tests for these new features. The main focus is on enabling the saving of large models by splitting their weights into manageable chunks, improving usability and scalability. <img width="950" height="43" alt="image" src="https://github.com/user-attachments/assets/50f055e2-7408-42aa-9191-6236ba2fcee9" /> **Major new features and improvements:** ### Sharding and Size Parsing Functionality - Added logic to shard tensors into multiple safetensors files using a new `max_shard_size` parameter in `save_file`. This includes helper functions for parsing human-readable size strings (e.g., "5GB", "100MB") and generating consistent shard filenames. When sharding occurs, an index file is also created to map tensors to their respective shards. [[1]](diffhunk://#diff-3eb7f37ead9b460c5ee7867f66123eca4dfd7a2b04406c2df2f5e9df27dcf8f9R9) [[2]](diffhunk://#diff-3eb7f37ead9b460c5ee7867f66123eca4dfd7a2b04406c2df2f5e9df27dcf8f9R70-R179) [[3]](diffhunk://#diff-3eb7f37ead9b460c5ee7867f66123eca4dfd7a2b04406c2df2f5e9df27dcf8f9R370) [[4]](diffhunk://#diff-3eb7f37ead9b460c5ee7867f66123eca4dfd7a2b04406c2df2f5e9df27dcf8f9R382-R384) [[5]](diffhunk://#diff-3eb7f37ead9b460c5ee7867f66123eca4dfd7a2b04406c2df2f5e9df27dcf8f9R400-R401) [[6]](diffhunk://#diff-3eb7f37ead9b460c5ee7867f66123eca4dfd7a2b04406c2df2f5e9df27dcf8f9R421-R477) ### New API for Model Saving - Introduced a new `save_model` function to the public API, allowing users to save an ONNX model and its weights (optionally sharded) in one call. This function enforces that the external data file uses the `.safetensors` extension and supports the new sharding mechanism. [[1]](diffhunk://#diff-a04ee6152b4b4bbfa13c0ca8e9abf7b92c268d4226db8a605e04fbb6456c6311R10) [[2]](diffhunk://#diff-a04ee6152b4b4bbfa13c0ca8e9abf7b92c268d4226db8a605e04fbb6456c6311R20) [[3]](diffhunk://#diff-3eb7f37ead9b460c5ee7867f66123eca4dfd7a2b04406c2df2f5e9df27dcf8f9R488-R537) ### Testing and Validation - Added extensive unit tests to cover the new `save_model` API, sharding logic, size string parsing, and filename generation. Tests verify correct file outputs, error handling, and that sharding/indexing works as expected for both ONNX and IR models. [[1]](diffhunk://#diff-bc794a7949109ddb39c25b9f90d153b28233998579a47a0add0381de6a72c20aR114-R183) [[2]](diffhunk://#diff-bc794a7949109ddb39c25b9f90d153b28233998579a47a0add0381de6a72c20aR306-R510) These changes significantly enhance the library's ability to handle large models and improve the developer experience with a more user-friendly API. --------- Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
diff --git a/.gitignore b/.gitignore
@@ -158,3 +158,5 @@ cython_debug/
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 #.idea/
+
+.DS_Store
diff --git a/README.md b/README.md
@@ -75,6 +75,55 @@ model_with_external_data = onnx_safetensors.save_file(model, data_path, base_dir
 onnx.save(model_with_external_data, os.path.join(base_dir, "model_using_safetensors.onnx"))
 ```
 
+### Save an ONNX model with safetensors weights
+
+The `save_model` function is a convenient way to save both the ONNX model and its weights to separate files:
+
+```python
+import onnx_safetensors
+
+# Provide your ONNX model here
+model: onnx.ModelProto
+
+# Save model and weights in one step
+# This creates model.onnx and model.safetensors
+onnx_safetensors.save_model(model, "model.onnx")
+
+# You can also specify a custom name for the weights file
+onnx_safetensors.save_model(model, "model.onnx", external_data="weights.safetensors")
+```
+
+### Shard large models
+
+For large models, you can automatically shard the weights across multiple safetensors files:
+
+```python
+import onnx_safetensors
+
+# Provide your ONNX model here
+model: onnx.ModelProto
+
+# Shard the model into multiple files (e.g., 5GB per shard)
+# This creates:
+# - model.onnx
+# - model-00001-of-00003.safetensors
+# - model-00002-of-00003.safetensors
+# - model-00003-of-00003.safetensors
+# - model.safetensors.index.json (index file mapping tensors to shards)
+onnx_safetensors.save_model(model, "model.onnx", max_shard_size="5GB")
+
+# You can also use save_file with sharding
+onnx_safetensors.save_file(
+    model,
+    "weights.safetensors",
+    base_dir="path/to/save",
+    max_shard_size="5GB"
+)
+```
+
+The sharding format is compatible with the Hugging Face transformers library, making it easy to share and load models across different frameworks.
+
 ## Examples
 
 - [Tutorial notebook](examples/tutorial.ipynb)
+- [save_model and sharding examples](examples/save_model_sharding.py)
diff --git a/examples/save_model_sharding.py b/examples/save_model_sharding.py
@@ -0,0 +1,214 @@
+"""Example demonstrating save_model and model sharding functionality.
+
+This example shows how to:
+1. Save an ONNX model with safetensors weights using save_model
+2. Shard large models across multiple safetensors files
+3. Load and verify sharded models with ONNX Runtime
+"""
+
+import glob
+import json
+import os
+
+import numpy as np
+import onnx
+import onnx.helper
+import onnx.numpy_helper
+import onnxruntime as ort
+
+import onnx_safetensors
+
+
+def create_example_model(large: bool = False) -> onnx.ModelProto:
+    """Create an example ONNX model for demonstration.
+
+    Args:
+        large: If True, creates a larger model to demonstrate sharding.
+
+    Returns:
+        An ONNX model.
+    """
+    if large:
+        # Create a larger model with multiple weight matrices to demonstrate sharding
+        weights1 = np.random.randn(1000, 1000).astype(np.float32)  # ~4MB
+        weights2 = np.random.randn(1000, 2000).astype(np.float32)  # ~8MB
+        weights3 = np.random.randn(2000, 1000).astype(np.float32)  # ~8MB
+
+        graph = onnx.helper.make_graph(
+            [
+                onnx.helper.make_node("MatMul", ["input", "weights1"], ["temp1"]),
+                onnx.helper.make_node("MatMul", ["temp1", "weights2"], ["temp2"]),
+                onnx.helper.make_node("MatMul", ["temp2", "weights3"], ["output"]),
+            ],
+            "large_model",
+            inputs=[
+                onnx.helper.make_tensor_value_info(
+                    "input", onnx.TensorProto.FLOAT, [1, 1000]
+                ),
+            ],
+            outputs=[
+                onnx.helper.make_tensor_value_info(
+                    "output", onnx.TensorProto.FLOAT, [1, 1000]
+                ),
+            ],
+            initializer=[
+                onnx.numpy_helper.from_array(weights1, name="weights1"),
+                onnx.numpy_helper.from_array(weights2, name="weights2"),
+                onnx.numpy_helper.from_array(weights3, name="weights3"),
+            ],
+        )
+    else:
+        # Create a simple model
+        weights = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], dtype=np.float32)
+
+        graph = onnx.helper.make_graph(
+            [
+                onnx.helper.make_node("Add", ["input", "weights"], ["output"]),
+            ],
+            "simple_model",
+            inputs=[
+                onnx.helper.make_tensor_value_info(
+                    "input", onnx.TensorProto.FLOAT, [2, 3]
+                ),
+            ],
+            outputs=[
+                onnx.helper.make_tensor_value_info(
+                    "output", onnx.TensorProto.FLOAT, [2, 3]
+                ),
+            ],
+            initializer=[onnx.numpy_helper.from_array(weights, name="weights")],
+        )
+
+    model = onnx.helper.make_model(
+        graph, opset_imports=[onnx.helper.make_opsetid("", 14)], ir_version=10
+    )
+    return model
+
+
+def example_basic_save_model():
+    """Example 1: Basic usage of save_model."""
+    print("Example 1: Basic save_model usage")
+    print("=" * 50)
+
+    # Create a simple model
+    model = create_example_model(large=False)
+
+    # Save model and weights
+    # This creates:
+    # - simple_model.onnx (ONNX model file)
+    # - simple_model.safetensors (weights file)
+    onnx_safetensors.save_model(model, "simple_model.onnx")
+
+    print("✓ Saved simple_model.onnx and simple_model.safetensors")
+
+    # Load and verify the model with ONNX Runtime
+    sess = ort.InferenceSession("simple_model.onnx", providers=["CPUExecutionProvider"])
+    input_data = np.ones((2, 3), dtype=np.float32)
+    outputs = sess.run(None, {"input": input_data})
+
+    print("✓ Model runs successfully with ONNX Runtime")
+    print(f"  Output shape: {outputs[0].shape}")
+    print()
+
+
+def example_custom_weights_file():
+    """Example 2: Specify a custom name for the weights file."""
+    print("Example 2: Custom weights file name")
+    print("=" * 50)
+
+    model = create_example_model(large=False)
+
+    # Save with custom weights file name
+    # This creates:
+    # - my_model.onnx
+    # - custom_weights.safetensors
+    onnx_safetensors.save_model(
+        model, "my_model.onnx", external_data="custom_weights.safetensors"
+    )
+
+    print("✓ Saved my_model.onnx with custom_weights.safetensors")
+    print()
+
+
+def example_model_sharding():
+    """Example 3: Shard a large model across multiple files."""
+    print("Example 3: Model sharding")
+    print("=" * 50)
+
+    # Create a larger model
+    model = create_example_model(large=True)
+
+    # Shard the model with 5MB per shard
+    # This creates:
+    # - large_model.onnx
+    # - large_model-00001-of-00004.safetensors
+    # - large_model-00002-of-00004.safetensors
+    # - large_model-00003-of-00004.safetensors
+    # - large_model-00004-of-00004.safetensors
+    # - large_model.safetensors.index.json (index file)
+    onnx_safetensors.save_model(model, "large_model.onnx", max_shard_size="5MB")
+
+    print("✓ Saved large_model.onnx with sharded weights")
+    print("  Files created:")
+
+    # List the created shard files
+    shard_files = sorted(glob.glob("large_model-*.safetensors"))
+    for shard_file in shard_files:
+        size_mb = os.path.getsize(shard_file) / (1024 * 1024)
+        print(f"    - {shard_file} ({size_mb:.2f} MB)")
+
+    # Check for index file
+    if os.path.exists("large_model.safetensors.index.json"):
+        with open("large_model.safetensors.index.json") as f:
+            index = json.load(f)
+        print(f"  ✓ Index file created with {len(index['weight_map'])} tensors mapped")
+
+    # Verify the sharded model works with ONNX Runtime
+    sess = ort.InferenceSession("large_model.onnx", providers=["CPUExecutionProvider"])
+    input_data = np.random.randn(1, 1000).astype(np.float32)
+    outputs = sess.run(None, {"input": input_data})
+
+    print("✓ Sharded model runs successfully with ONNX Runtime")
+    print(f"  Output shape: {outputs[0].shape}")
+    print()
+
+
+def example_save_file_with_sharding():
+    """Example 4: Use save_file with sharding for more control."""
+    print("Example 4: save_file with sharding")
+    print("=" * 50)
+
+    model = create_example_model(large=True)
+
+    # Save only the weights with sharding
+    # Note: This doesn't save the ONNX model file itself
+    onnx_safetensors.save_file(
+        model,
+        "weights_only.safetensors",
+        base_dir=".",
+        max_shard_size="5MB",
+        replace_data=False,  # Don't modify the model
+    )
+
+    print("✓ Saved sharded weights without modifying the model")
+
+    shard_files = sorted(glob.glob("weights_only-*.safetensors"))
+    print(f"  Created {len(shard_files)} shard files")
+    print()
+
+
+if __name__ == "__main__":
+    print("ONNX-Safetensors: save_model and Sharding Examples")
+    print("=" * 50)
+    print()
+
+    # Run all examples
+    example_basic_save_model()
+    example_custom_weights_file()
+    example_model_sharding()
+    example_save_file_with_sharding()
+
+    print("All examples completed successfully! ✓")
+    print()
+    print("Note: This example created several files for demonstration.")
+    print("You can safely delete them after reviewing.")
diff --git a/pyproject.toml b/pyproject.toml
@@ -33,6 +33,7 @@ dependencies = [
   "onnx>=1.16",
   "safetensors",
   "onnx-ir",
+  "tqdm",
 ]
 
 [project.urls]
diff --git a/requirements-dev.txt b/requirements-dev.txt
@@ -3,3 +3,4 @@ pytest-cov
 lintrunner
 lintrunner-adapters
 parameterized
+onnxruntime
diff --git a/src/onnx_safetensors/__init__.py b/src/onnx_safetensors/__init__.py
@@ -7,6 +7,7 @@
     "replace_tensors",
     "save",
     "save_file",
+    "save_model",
 ]
 
 from onnx_safetensors._safetensors_io import (
@@ -16,6 +17,7 @@
     replace_tensors,
     save,
     save_file,
+    save_model,
 )
 
 __version__ = "1.2.0"
diff --git a/src/onnx_safetensors/_safetensors_io.py b/src/onnx_safetensors/_safetensors_io.py
diff --git a/test/onnx_safetensors/api_test.py b/test/onnx_safetensors/api_test.py

Original file line number	Diff line number	Diff line change
`@@ -33,6 +33,7 @@ dependencies = [`
`33`	`33`	`"onnx>=1.16",`
`34`	`34`	`"safetensors",`
`35`	`35`	`"onnx-ir",`
	`36`	`+ "tqdm",`
`36`	`37`	`]`
`37`	`38`
`38`	`39`	`[project.urls]`
Original file line number	Diff line number	Diff line change
`@@ -7,6 +7,7 @@`
`7`	`7`	`"replace_tensors",`
`8`	`8`	`"save",`
`9`	`9`	`"save_file",`
	`10`	`+ "save_model",`
`10`	`11`	`]`
`11`	`12`
`12`	`13`	`from onnx_safetensors._safetensors_io import (`
`@@ -16,6 +17,7 @@`
`16`	`17`	`replace_tensors,`
`17`	`18`	`save,`
`18`	`19`	`save_file,`
	`20`	`+ save_model,`
`19`	`21`	`)`
`20`	`22`
`21`	`23`	`__version__ = "1.2.0"`