Yet another onnx builder, patches, flattening functions...
yet-another-onnx-builder (yobx) proposes a unique API and a unique function
yobx.to_onnx to convert machine learning models and other pipelines
to ONNX format from many libraries. Each converter relies on a common GraphBuider API
to build the final ONNX model. One default implementation is provided but
it can also be replaced by any implementation of your own
(onnxscript/ir-py, Spox).
These API are close to onnx API, using NodeProto for nodes
and strings for names. This is on purpose: what this API produces is
what you see in the final ONNX model. You can add your own metadata,
choose your own names.
standard machine learning
data manipulation
This is work in progress.
Many packages produce SQL queries. It starts by converting a SQL
query into ONNX. A lightweight DataFrame function tracer
(dataframe_to_onnx)
records pandas-inspired operations on a virtual DataFrame and compiles them to ONNX:
import numpy as np
from onnxruntime import InferenceSession
from yobx.sql import dataframe_to_onnx
from yobx.reference import ExtendedReferenceEvaluator
def transform(df):
df = df.filter(df["a"] > 0)
return df.select([(df["a"] + df["b"]).alias("total")])
artifact = dataframe_to_onnx(transform, {"a": np.float32, "b": np.float32})
ref = InferenceSession(artifact.SerializeToString(), providers=["CPUExecutionProvider"])
(total,) = ref.run(None, {"a": np.array([1., -2., 3.], np.float32),
"b": np.array([4., 5., 6.], np.float32)})
# total == [5., 9.]deeplearning
- litert
- jax in progress
- tensorflow
- torch
Its unique API across all converters:
import numpy as np
import onnxruntime
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer, StandardScaler
from yobx import to_onnx
# A custom numpy function traced to ONNX automatically
def log1p_abs(X):
return np.log1p(np.abs(X))
pipe = Pipeline([
("func", FunctionTransformer(func=log1p_abs)),
("scaler", StandardScaler()),
])
X_train = np.random.default_rng(0).standard_normal((80, 4)).astype(np.float32)
pipe.fit(X_train)
# Export the whole pipeline to ONNX in one call
artifact = to_onnx(pipe, (X_train[:1],))
# Run with onnxruntime
sess = onnxruntime.InferenceSession(
artifact.proto.SerializeToString(), providers=["CPUExecutionProvider"]
)
(result,) = sess.run(None, {"X": X_train})onnxruntime optimizations are triggered with
target_opset={"": 22, "com.microsoft": 1}.
Design choices yobx
- Single entry point —
yobx.to_onnxdispatches to the right backend automatically; no need to learn a different API for every framework. - Pluggable graph-builder — the intermediate ONNX graph can be built with the built-in
GraphBuilder, with onnxscript/ir-py, or with Spox, keeping the conversion code framework-agnostic. - Transparent names — node names, initializer names and result names are preserved as-is (unless they are not unique); what the builder writes is what ends up in the ONNX file.
- Built-in optimizer — pattern-based graph rewrites (constant folding, fused ops, …) can be run before serialization.
- ORT-specific targets — passing
target_opset={"": 22, "com.microsoft": 1}enablescom.microsoftdomain operators consumed directly by onnxruntime.
Comparison with existing tools
The main new features is the possibility to trace functions written with NumPy, functions operating on DataFrames, and SQL queries.
User can now convert FunctionTransformer from scikit-learn or preprocessing through SQL queries or DataFrames.
The implementation was simplified to only handle recent versions of scikit-learn, TensorFlow/Keras, LiteRT. It was extended to other famous packages such category_encoders.
One single package for one single repository, one possible source of issues, making it easier for contributors to answer.
| Tool | Scope | Notes |
|---|---|---|
| torch.onnx.export | PyTorch only | Official PyTorch exporter; yobx can delegate to it or use its own FX-based path, and offers several options to trace the fx.Graph (default, symbolic tracing, new tracing) to have more options to overcome complex models |
| sklearn-onnx | scikit-learn only | Covers the scikit-learn ecosystem; yobx extends this with a unified API and adds support for custom functions written with NumPy via automatic tracing, yobx supports new packages such as category_encoders, ... |
| tf2onnx | TensorFlow / Keras | Converts TensorFlow models; yobx wraps the same models under one entry point |
| ModelBuilder | LLM inference (genai) | ModelBuilder produces models better optimized for onnxruntime, yobx supports more models but is less efficient for this specific scenario. |
This package was initially starting using Vibe Coding.