-
-
Notifications
You must be signed in to change notification settings - Fork 39
Description
Hey, first of all congratulations and thank you for the very nice work :) I like exllamav2, and I tried to implement by myself quip and qtip, so I am very happy to use exllamv3!
Everything works well (tested on llama 8b and 1b), but I can not use torch.compile on a exllamv3-quantized model :(
In particular, during the forward pass, the function ext.exl3_gemm(x, self.trellis, y, self.suh, xh, self.svh, -1, self.mcg_mult, self.mul1_mult)
, yields the following error (with "compile_fullgraph" = True and "mode" = 'max-autotune') :
Unsupported: Attempted to call function marked as skipped
Explanation: Dynamo does not know how to trace the builtin `exllamav3_ext.PyCapsule.exl3_gemm.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
Developer debug context: module: exllamav3_ext, qualname: PyCapsule.exl3_gemm, skip reason: <missing reason>
I guess I can bypass this error by patching the custom gemm, as suggested in the error message, but I am not sure.
Did you already tried to compile your model with torch.compile ? If no, do you plan to make the custom cpp-torch bindings compatible with torch dynamo?
Thx in advance