Skip to content

xadupre/yet-another-onnx-builder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,017 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

yet-another-onnx-builder

core scikit-learn tensorflow pytorch Documentation Style Spelling codecov GitHub repo size Ruff Code style: black

Yet another onnx builder, patches, flattening functions...

Documentation

yet-another-onnx-builder (yobx) proposes a unique API and a unique function yobx.to_onnx to convert machine learning models and other pipelines to ONNX format from many libraries. Each converter relies on a common GraphBuider API to build the final ONNX model. One default implementation is provided but it can also be replaced by any implementation of your own (onnxscript/ir-py, Spox). These API are close to onnx API, using NodeProto for nodes and strings for names. This is on purpose: what this API produces is what you see in the final ONNX model. You can add your own metadata, choose your own names.

standard machine learning

data manipulation

This is work in progress. Many packages produce SQL queries. It starts by converting a SQL query into ONNX. A lightweight DataFrame function tracer (dataframe_to_onnx) records pandas-inspired operations on a virtual DataFrame and compiles them to ONNX:

import numpy as np
from onnxruntime import InferenceSession
from yobx.sql import dataframe_to_onnx
from yobx.reference import ExtendedReferenceEvaluator

def transform(df):
    df = df.filter(df["a"] > 0)
    return df.select([(df["a"] + df["b"]).alias("total")])

artifact = dataframe_to_onnx(transform, {"a": np.float32, "b": np.float32})
ref = InferenceSession(artifact.SerializeToString(), providers=["CPUExecutionProvider"])
(total,) = ref.run(None, {"a": np.array([1., -2., 3.], np.float32),
                           "b": np.array([4.,  5., 6.], np.float32)})
# total == [5., 9.]

deeplearning

Its unique API across all converters:

import numpy as np
import onnxruntime
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer, StandardScaler
from yobx import to_onnx

# A custom numpy function traced to ONNX automatically
def log1p_abs(X):
    return np.log1p(np.abs(X))

pipe = Pipeline([
    ("func", FunctionTransformer(func=log1p_abs)),
    ("scaler", StandardScaler()),
])

X_train = np.random.default_rng(0).standard_normal((80, 4)).astype(np.float32)
pipe.fit(X_train)

# Export the whole pipeline to ONNX in one call
artifact = to_onnx(pipe, (X_train[:1],))

# Run with onnxruntime
sess = onnxruntime.InferenceSession(
    artifact.proto.SerializeToString(), providers=["CPUExecutionProvider"]
)
(result,) = sess.run(None, {"X": X_train})

onnxruntime optimizations are triggered with target_opset={"": 22, "com.microsoft": 1}.

Comparison with existing ONNX conversion tools

Design choices yobx

  • Single entry pointyobx.to_onnx dispatches to the right backend automatically; no need to learn a different API for every framework.
  • Pluggable graph-builder — the intermediate ONNX graph can be built with the built-in GraphBuilder, with onnxscript/ir-py, or with Spox, keeping the conversion code framework-agnostic.
  • Transparent names — node names, initializer names and result names are preserved as-is (unless they are not unique); what the builder writes is what ends up in the ONNX file.
  • Built-in optimizer — pattern-based graph rewrites (constant folding, fused ops, …) can be run before serialization.
  • ORT-specific targets — passing target_opset={"": 22, "com.microsoft": 1} enables com.microsoft domain operators consumed directly by onnxruntime.

Comparison with existing tools

The main new features is the possibility to trace functions written with NumPy, functions operating on DataFrames, and SQL queries. User can now convert FunctionTransformer from scikit-learn or preprocessing through SQL queries or DataFrames.

The implementation was simplified to only handle recent versions of scikit-learn, TensorFlow/Keras, LiteRT. It was extended to other famous packages such category_encoders.

One single package for one single repository, one possible source of issues, making it easier for contributors to answer.

Tool Scope Notes
torch.onnx.export PyTorch only Official PyTorch exporter; yobx can delegate to it or use its own FX-based path, and offers several options to trace the fx.Graph (default, symbolic tracing, new tracing) to have more options to overcome complex models
sklearn-onnx scikit-learn only Covers the scikit-learn ecosystem; yobx extends this with a unified API and adds support for custom functions written with NumPy via automatic tracing, yobx supports new packages such as category_encoders, ...
tf2onnx TensorFlow / Keras Converts TensorFlow models; yobx wraps the same models under one entry point
ModelBuilder LLM inference (genai) ModelBuilder produces models better optimized for onnxruntime, yobx supports more models but is less efficient for this specific scenario.

This package was initially starting using Vibe Coding.

About

yet another onnx builder

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages