-
Notifications
You must be signed in to change notification settings - Fork 17
Support HF LLaMA ckpt conversion #118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for support HF llama chpt conversion! Can you save hf weight names as a file in the repo?
"self_attn.k_proj": "attention.wk", | ||
"self_attn.v_proj": "attention.wv", | ||
"self_attn.o_proj": "attention.wo", | ||
"mlp.gate_proj": "feed_forward.w1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel [gate|down|up]_proj are more read friendly than w1, w2 and w3. @qihqi Shall we consider rename it to proj related name in default checkpoint convert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this makes sense. Also want to note that original llama weight is using w1/2/3 https://github.com/meta-llama/llama3/blob/main/llama/model.py#L219. If we change it we need to do the name mapping for the original llama weight.
assert ( | ||
not FLAGS.quantize_weights | ||
), "Quantization not supported for HF checkpoint." | ||
return _load_hf_llama_weight(input_ckpt_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test the llama2-70B model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I didn't, since I haven't set up multi host yet. But I will do that later
Added
--from_hf
option inconvert_checkpoint.py
for HF checkpoint. Only LLaMA is supported now. Quantization conversion is not supported with HF checkpoint.Enable converting HF llama checkpoint by
The guide to add support for HF checkpoint will be done in a following PR.
Only tested with HF 7B model, 70B not tested yet