Linear from PyTorch must map to Gemm in ONNX

PyTorch Model:

```python
class NeuralNet(torch.nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super().__init__()

        self.fc1 = torch.nn.Linear(input_size, hidden_size)
        self.relu = torch.nn.ReLU()
        self.fc2 = torch.nn.Linear(hidden_size, num_classes)

    def forward(self, input1):
        out = self.fc1(input1)
        out = self.relu(out)
        out = self.fc2(out)
        return out
```

Exporting with torch script based exporter yields:

![image](https://github.com/microsoft/onnxscript/assets/12852605/dd087259-a202-4b7a-ab71-c55d54d68c7f)

which makes sense. It is after all a linear layer followed by a ReLU followed by another Linear layer.

Exporting the same model with torch dynamo based exporter yields:

<img width="263" alt="image" src="https://github.com/microsoft/onnxscript/assets/12852605/f7eb9abb-ba67-4f01-a25a-8e287f83cf72">

Two levels beneath the linear layer, I find:

<img width="380" alt="image" src="https://github.com/microsoft/onnxscript/assets/12852605/03751eb6-3400-4a46-8d04-cb88e0f2012b">


It seems like the Gemm is somehow manifested as a subgraph with matmuls, muls, adds, and castlikes. And digging deeper, I find that this definition comes from https://github.com/microsoft/onnxscript/blob/a981b8add9a4c7e67ad0d28622b23d4c6a55a76e/onnxscript/function_libs/torch_lib/ops/core.py#L220-L229

It seems wasteful that an op as simple as a Gemm needs to be represented as this subgraph. Looking at [this document](https://github.com/microsoft/onnxscript/issues/165), this seems to be a design choice.

> We favor general ops like MatMul than specialized ops like Gemm in the function lib.

But imagine a model having thousands of `Gemm`s. Each `Gemm` is now this subgraph. Which means this optimization/fusion needs to run thousands of times to achieve something that probably can be achieved very easily at the source.

It would benefit ONNX Runtime (inference and training) and the larger ONNX community if this subgraph were represented as a Gemm node after export.


	@torch_op("aten::addmm")
	def aten_addmm(
	self: TReal, mat1: TReal, mat2: TReal, beta: float = 1.0, alpha: float = 1.0
	) -> TReal:
	"""addmm(Tensor self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1) -> Tensor"""

	mat1_mat2 = op.MatMul(mat1, mat2)
	scaled_mat1_mat2 = op.Mul(mat1_mat2, alpha)
	scaled_self = op.Mul(self, beta)
	return op.Add(scaled_self, scaled_mat1_mat2)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Linear from PyTorch must map to Gemm in ONNX #1089

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Linear from PyTorch must map to Gemm in ONNX #1089

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions