-
Notifications
You must be signed in to change notification settings - Fork 260
The next tutorials #426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
8-bit Adam from bitsandbytes. Resources for reference
|
Would love to work on this.
|
|
RE: profiling
|
For metrics the most important ones are memory bandwidth and flop utilization. A good representative workload for now is probably llama2 and llama3 https://github.com/pytorch/ao/blob/main/torchao/_models/llama/generate.py and this script has good metric instrumentation already so extending it feels natural And for specific algorithms to test out I'd be most curious about testing out |
* remove need for toktoken flag * can't pass self to a function * remove toktoken cli flag * eliminate need to load entire model when we only need model.config
From our README.md
And so far we've done a good job building out the primitive data types along with their corresponding transformed Linear Layers so for example given a new
ExoticDtype()
we have a playbook to createExoticDtypeLinear()
and indeed for weight only transformations this is a perfectly fine workflow and how the majority of quantization libraries operate.For example
We can make the above shine with more accessible blogs and performance benchmarks and integrations with more partners
However, this is doing somewhat of a disservice at explaining the ao value proposition. For example, we're a dtype library and not a dtype Linear library so given a dtype it should be easy for us to do a lot more. So some examples I'd like to see next are
None of the above is "research", this is very much the way engineering is moving for inference https://blog.character.ai/optimizing-ai-inference-at-character-ai/
Also given an exotic quantization schema I'd like to be more proactive in helping people benchmark their models so this should include
The text was updated successfully, but these errors were encountered: