Skip to content

Can not capture graph with torch.compile (dynamo) #64

@llcnt

Description

@llcnt

Hey, first of all congratulations and thank you for the very nice work :) I like exllamav2, and I tried to implement by myself quip and qtip, so I am very happy to use exllamv3!
Everything works well (tested on llama 8b and 1b), but I can not use torch.compile on a exllamv3-quantized model :(
In particular, during the forward pass, the function ext.exl3_gemm(x, self.trellis, y, self.suh, xh, self.svh, -1, self.mcg_mult, self.mul1_mult), yields the following error (with "compile_fullgraph" = True and "mode" = 'max-autotune') :

Unsupported: Attempted to call function marked as skipped
  Explanation: Dynamo does not know how to trace the builtin `exllamav3_ext.PyCapsule.exl3_gemm.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.

  Developer debug context: module: exllamav3_ext, qualname: PyCapsule.exl3_gemm, skip reason: <missing reason>

I guess I can bypass this error by patching the custom gemm, as suggested in the error message, but I am not sure.

Did you already tried to compile your model with torch.compile ? If no, do you plan to make the custom cpp-torch bindings compatible with torch dynamo?
Thx in advance

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions