Description
PyTorch Model:
class NeuralNet(torch.nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super().__init__()
self.fc1 = torch.nn.Linear(input_size, hidden_size)
self.relu = torch.nn.ReLU()
self.fc2 = torch.nn.Linear(hidden_size, num_classes)
def forward(self, input1):
out = self.fc1(input1)
out = self.relu(out)
out = self.fc2(out)
return out
Exporting with torch script based exporter yields:
which makes sense. It is after all a linear layer followed by a ReLU followed by another Linear layer.
Exporting the same model with torch dynamo based exporter yields:

Two levels beneath the linear layer, I find:

It seems like the Gemm is somehow manifested as a subgraph with matmuls, muls, adds, and castlikes. And digging deeper, I find that this definition comes from
onnxscript/onnxscript/function_libs/torch_lib/ops/core.py
Lines 220 to 229 in a981b8a
It seems wasteful that an op as simple as a Gemm needs to be represented as this subgraph. Looking at this document, this seems to be a design choice.
We favor general ops like MatMul than specialized ops like Gemm in the function lib.
But imagine a model having thousands of Gemm
s. Each Gemm
is now this subgraph. Which means this optimization/fusion needs to run thousands of times to achieve something that probably can be achieved very easily at the source.
It would benefit ONNX Runtime (inference and training) and the larger ONNX community if this subgraph were represented as a Gemm node after export.