You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Renaming fpx to floatx
Summary:
att, to allow float8 code to be moved to floatx folder
fpx_weight_only is not yet renamed to floatx_weight_only yet, we'll do that
in the future after we have more clarity on what specific dtypes we want to support (e.g. maybe we'll
just support fp4, fp6)
Test Plan:
python test/dtypes/test_floatx.py
Reviewers:
Subscribers:
Tasks:
Tags:
* fix test_ops
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -128,7 +128,7 @@ The best example we have combining the composability of lower bit dtype with com
128
128
129
129
We've added support for authoring and releasing [custom ops](./torchao/csrc/) that do not graph break with `torch.compile()` so if you love writing kernels but hate packaging them so they work all operating systems and cuda versions, we'd love to accept contributions for your custom ops. We have a few examples you can follow
130
130
131
-
1.[fp6](torchao/prototype/quant_llm/) for 2x faster inference over fp16 with an easy to use API `quantize_(model, fp6_llm_weight_only())`
131
+
1.[fp6](torchao/dtypes/floatx) for 2x faster inference over fp16 with an easy to use API `quantize_(model, fpx_weight_only(3, 2))`
132
132
2.[2:4 Sparse Marlin GEMM](https://github.com/pytorch/ao/pull/733) 2x speedups for FP16xINT4 kernels even at batch sizes up to 256
133
133
3.[int4 tinygemm unpacker](https://github.com/pytorch/ao/pull/415) which makes it easier to switch quantized backends for inference
0 commit comments