-
Notifications
You must be signed in to change notification settings - Fork 686
Update QAT recipe to match full finetune recipe (5/12/25) #2721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2721
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit f0cefe6 with merge base a6db644 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
**Summary:** Similar to meta-pytorch#1854. Update `qat_distributed` recipe to mirror `full_finetune_distributed` up until a6db644. The new major feature that is excluded from `qat_distributed` is FP8 finetuning (meta-pytorch#2546), since QAT FP8 is not supported in torchao yet. Diff between full finetune and QAT recipes: P1809370361 ``` diff --color recipes/full_finetune_distributed.py recipes/qat_distributed.py ``` **Test Plan:** Finetune: ``` tune run --nnodes 1 --nproc_per_node 4 qat_distributed --config llama3_2/3B_qat_full \ epochs=1 \ batch_size=16 \ dataset._component_=torchtune.datasets.alpaca_cleaned_dataset \ checkpointer.output_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat \ output_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat/metrics \ metric_logger.log_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat/metrics \ quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQATQuantizer \ quantizer.groupsize=32 ``` Quantize: ``` tune run quantize --config quantization \ model._component_=torchtune.models.llama3_2.llama3_2_3b \ checkpointer._component_=torchtune.training.FullModelHFCheckpointer \ checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat/epoch_0 \ checkpointer.output_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat/epoch_0_out \ 'checkpointer.checkpoint_files=[model-00001-of-00002.safetensors,model-00002-of-00002.safetensors]' \ checkpointer.model_type=LLAMA3 \ quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \ quantizer.groupsize=32 ``` Eval: ``` tune run eleuther_eval --config eleuther_evaluation \ batch_size=1 \ 'tasks=[wikitext]' \ model._component_=torchtune.models.llama3_2.llama3_2_3b \ checkpointer._component_=torchtune.training.FullModelTorchTuneCheckpointer \ checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat/epoch_0 \ checkpointer.output_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat/epoch_0_out \ 'checkpointer.checkpoint_files=[model-00001-of-00002-8da4w.ckpt]' \ checkpointer.model_type=LLAMA3 \ tokenizer._component_=torchtune.models.llama3.llama3_tokenizer \ tokenizer.path=/tmp/Meta-Llama-3-8B-Instruct/original/tokenizer.model \ quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \ quantizer.groupsize=32 ``` Results: ``` experiment_name tok/s peak_mem_active peak_mem_alloc peak_mem_reserved ----------------------- ------------------- ----------------- ---------------- ------------------- Llama3.2-3B_alpaca_full 4677.163 (+0.000%) 12.261 (+0.000%) 12.261 (+0.000%) 15.778 (+0.000%) Llama3.2-3B_alpaca_qat 1873.316 (-59.948%) 13.047 (+6.409%) 13.047 (+6.409%) 17.226 (+9.176%) experiment_name hellaswag_acc wikitext_word_perplexity ----------------------- ------------------------------ ------------------------------- Llama3.2-3B_alpaca_full 0.470 quant, 0.534 float 18.563 quant, 12.364 float Llama3.2-3B_alpaca_qat 0.511 quant, recovered 63.043% 13.792 quant, recovered 76.962% ```
0bb8049 to
f0cefe6
Compare
joecummings
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're the best, thank you!
**Summary:** Similar to meta-pytorch#2721. Update `qat_lora_finetune_distributed` recipe to mirror `lora_finetune_distributed` up until 0991f97. Diff between lora finetune and QAT lora recipes: ``` diff --color recipes/lora_finetune_distributed.py recipes/qat_lora_finetune_distributed.py ``` **Test Plan:** TBD
**Summary:** Similar to meta-pytorch#2721. Update `qat_lora_finetune_distributed` recipe to mirror `lora_finetune_distributed` up until 3d73591. Diff between lora finetune and QAT lora recipes: ``` diff --color recipes/lora_finetune_distributed.py recipes/qat_lora_finetune_distributed.py ``` **Test Plan:** TBD
**Summary:** Similar to meta-pytorch#2721. Update `qat_lora_finetune_distributed` recipe to mirror `lora_finetune_distributed` up until 3d73591. Diff between lora finetune and QAT lora recipes: ``` diff --color recipes/lora_finetune_distributed.py recipes/qat_lora_finetune_distributed.py ``` **Test Plan:** TBD
**Summary:** Similar to meta-pytorch#2721. Update `qat_distributed` recipe to mirror `full_finetune_distributed` up until 3d73591. Diff between full finetune and QAT recipes: ``` diff --color recipes/full_finetune_distributed.py recipes/qat_distributed.py ``` **Test Plan:** TBD
**Summary:** Similar to meta-pytorch#2721. Update `qat_lora_finetune_distributed` recipe to mirror `lora_finetune_distributed` up until 371bb0b. Diff between lora finetune and QAT lora recipes: ``` diff --color recipes/lora_finetune_distributed.py recipes/qat_lora_finetune_distributed.py ``` **Test Plan:** TBD
**Summary:** Similar to meta-pytorch#2721. Update `qat_distributed` recipe to mirror `full_finetune_distributed` up until 371bb0b. Diff between full finetune and QAT recipes: ``` diff --color recipes/full_finetune_distributed.py recipes/qat_distributed.py ``` **Test Plan:** TBD
**Summary:** Similar to meta-pytorch#2721. Update `qat_lora_finetune_distributed` recipe to mirror `lora_finetune_distributed` up until 371bb0b. Diff between lora finetune and QAT lora recipes: ``` diff --color recipes/lora_finetune_distributed.py recipes/qat_lora_finetune_distributed.py ``` **Test Plan:** Fine-tune: ``` tune run --nnodes 1 --nproc_per_node 4 qat_lora_finetune_distributed --config qwen3/1.7B_qat_lora \ epochs=1 \ batch_size=16 \ dataset._component_=torchtune.datasets.alpaca_cleaned_dataset \ checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat \ output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/metrics \ metric_logger.log_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/metrics \ quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQATQuantizer \ quantizer.groupsize=256 ``` Quantize: ``` tune run quantize --config quantization \ model._component_=torchtune.models.qwen3.lora_qwen3_1_7b_instruct \ checkpointer._component_=torchtune.training.FullModelHFCheckpointer \ checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0 \ checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0_out \ 'checkpointer.checkpoint_files=[model-00001-of-00002.safetensors,model-00002-of-00002.safetensors]' \ checkpointer.model_type=QWEN3 \ tokenizer._component_=torchtune.models.qwen3.qwen3_tokenizer \ tokenizer.path=/tmp/Qwen3-1.7B/vocab.json \ tokenizer.merges_file=/tmp/Qwen3-1.7B/merges.txt \ quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \ quantizer.groupsize=256 ``` Eval: ``` tune run eleuther_eval --config eleuther_evaluation \ batch_size=1 \ 'tasks=[hellaswag,wikitext]' \ model._component_=torchtune.models.qwen3.lora_qwen3_1_7b_instruct \ checkpointer._component_=torchtune.training.FullModelTorchTuneCheckpointer \ checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0 \ checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0_out \ 'checkpointer.checkpoint_files=[model-00001-of-00002-8da4w.ckpt]' \ checkpointer.model_type=QWEN3 \ tokenizer._component_=torchtune.models.qwen3.qwen3_tokenizer \ tokenizer.path=/tmp/Qwen3-1.7B/vocab.json \ tokenizer.merges_file=/tmp/Qwen3-1.7B/merges.txt \ quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \ quantizer.groupsize=256 ``` Results: ``` experiment_name tok/s peak_mem_active peak_mem_alloc peak_mem_reserved ------------------- ------------------- ----------------- ---------------- ------------------- Qwen3-1.7B_full 5687.638 (+0.000%) 7.009 (+0.000%) 7.009 (+0.000%) 11.075 (+0.000%) Qwen3-1.7B_qat_lora 2812.026 (-50.559%) 5.945 (-15.177%) 5.945 (-15.177%) 10.146 (-8.390%) experiment_name hellaswag_acc wikitext_word_perplexity ------------------- ------------------------------ ------------------------------- Qwen3-1.7B_full 0.370 quant, 0.449 float 140.294 quant, 29.461 float Qwen3-1.7B_qat_lora 0.421 quant, recovered 64.602% 46.755 quant, recovered 84.396% ```
**Summary:** Similar to meta-pytorch#2721. Update `qat_distributed` recipe to mirror `full_finetune_distributed` up until 371bb0b. Diff between full finetune and QAT recipes: ``` diff --color recipes/full_finetune_distributed.py recipes/qat_distributed.py ``` **Test Plan:** Fine-tune: ``` tune run --nnodes 1 --nproc_per_node 4 qat_distributed --config qwen3/1.7B_qat_full \ epochs=1 \ batch_size=16 \ dataset._component_=torchtune.datasets.alpaca_cleaned_dataset \ checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat \ output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/metrics \ metric_logger.log_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/metrics \ quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQATQuantizer \ quantizer.groupsize=256 ``` Quantize: ``` tune run quantize --config quantization \ model._component_=torchtune.models.qwen3.qwen3_1_7b_instruct \ checkpointer._component_=torchtune.training.FullModelHFCheckpointer \ checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0 \ checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0_out \ 'checkpointer.checkpoint_files=[model-00001-of-00002.safetensors,model-00002-of-00002.safetensors]' \ checkpointer.model_type=QWEN3 \ tokenizer._component_=torchtune.models.qwen3.qwen3_tokenizer \ tokenizer.path=/tmp/Qwen3-1.7B/vocab.json \ tokenizer.merges_file=/tmp/Qwen3-1.7B/merges.txt \ quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \ quantizer.groupsize=256 ``` Eval: ``` tune run eleuther_eval --config eleuther_evaluation \ batch_size=1 \ 'tasks=[hellaswag,wikitext]' \ model._component_=torchtune.models.qwen3.qwen3_1_7b_instruct \ checkpointer._component_=torchtune.training.FullModelTorchTuneCheckpointer \ checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0 \ checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0_out \ 'checkpointer.checkpoint_files=[model-00001-of-00002-8da4w.ckpt]' \ checkpointer.model_type=QWEN3 \ tokenizer._component_=torchtune.models.qwen3.qwen3_tokenizer \ tokenizer.path=/tmp/Qwen3-1.7B/vocab.json \ tokenizer.merges_file=/tmp/Qwen3-1.7B/merges.txt \ quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \ quantizer.groupsize=256 ``` Results: ``` experiment_name tok/s peak_mem_active peak_mem_alloc peak_mem_reserved ------------------- ------------------- ----------------- ---------------- ------------------- Qwen3-1.7B_full 5687.638 (+0.000%) 7.009 (+0.000%) 7.009 (+0.000%) 11.075 (+0.000%) Qwen3-1.7B_qat 2569.197 (-54.828%) 7.394 (+5.496%) 7.394 (+5.496%) 12.559 (+13.398%) experiment_name hellaswag_acc wikitext_word_perplexity ------------------- ------------------------------ ------------------------------- Qwen3-1.7B_full 0.370 quant, 0.449 float 140.294 quant, 29.461 float Qwen3-1.7B_qat 0.406 quant, recovered 44.753% 48.768 quant, recovered 82.580% ```
**Summary:** Similar to meta-pytorch#2721. Update `qat_lora_finetune_distributed` recipe to mirror `lora_finetune_distributed` up until 371bb0b. Diff between lora finetune and QAT lora recipes: ``` diff --color recipes/lora_finetune_distributed.py recipes/qat_lora_finetune_distributed.py ``` **Test Plan:** Fine-tune: ``` tune run --nnodes 1 --nproc_per_node 4 qat_lora_finetune_distributed --config qwen3/1.7B_qat_lora \ epochs=1 \ batch_size=16 \ dataset._component_=torchtune.datasets.alpaca_cleaned_dataset \ checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat \ output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/metrics \ metric_logger.log_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/metrics \ quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQATQuantizer \ quantizer.groupsize=256 ``` Quantize: ``` tune run quantize --config quantization \ model._component_=torchtune.models.qwen3.lora_qwen3_1_7b_instruct \ checkpointer._component_=torchtune.training.FullModelHFCheckpointer \ checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0 \ checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0_out \ 'checkpointer.checkpoint_files=[model-00001-of-00002.safetensors,model-00002-of-00002.safetensors]' \ checkpointer.model_type=QWEN3 \ tokenizer._component_=torchtune.models.qwen3.qwen3_tokenizer \ tokenizer.path=/tmp/Qwen3-1.7B/vocab.json \ tokenizer.merges_file=/tmp/Qwen3-1.7B/merges.txt \ quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \ quantizer.groupsize=256 ``` Eval: ``` tune run eleuther_eval --config eleuther_evaluation \ batch_size=1 \ 'tasks=[hellaswag,wikitext]' \ model._component_=torchtune.models.qwen3.lora_qwen3_1_7b_instruct \ checkpointer._component_=torchtune.training.FullModelTorchTuneCheckpointer \ checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0 \ checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0_out \ 'checkpointer.checkpoint_files=[model-00001-of-00002-8da4w.ckpt]' \ checkpointer.model_type=QWEN3 \ tokenizer._component_=torchtune.models.qwen3.qwen3_tokenizer \ tokenizer.path=/tmp/Qwen3-1.7B/vocab.json \ tokenizer.merges_file=/tmp/Qwen3-1.7B/merges.txt \ quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \ quantizer.groupsize=256 ``` Results: ``` experiment_name tok/s peak_mem_active peak_mem_alloc peak_mem_reserved ------------------- ------------------- ----------------- ---------------- ------------------- Qwen3-1.7B_full 5687.638 (+0.000%) 7.009 (+0.000%) 7.009 (+0.000%) 11.075 (+0.000%) Qwen3-1.7B_qat_lora 2812.026 (-50.559%) 5.945 (-15.177%) 5.945 (-15.177%) 10.146 (-8.390%) experiment_name hellaswag_acc wikitext_word_perplexity ------------------- ------------------------------ ------------------------------- Qwen3-1.7B_full 0.370 quant, 0.449 float 140.294 quant, 29.461 float Qwen3-1.7B_qat_lora 0.421 quant, recovered 64.602% 46.755 quant, recovered 84.396% ```
**Summary:** Similar to meta-pytorch#2721. Update `qat_distributed` recipe to mirror `full_finetune_distributed` up until 371bb0b. Diff between full finetune and QAT recipes: ``` diff --color recipes/full_finetune_distributed.py recipes/qat_distributed.py ``` **Test Plan:** Fine-tune: ``` tune run --nnodes 1 --nproc_per_node 4 qat_distributed --config qwen3/1.7B_qat_full \ epochs=1 \ batch_size=16 \ dataset._component_=torchtune.datasets.alpaca_cleaned_dataset \ checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat \ output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/metrics \ metric_logger.log_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/metrics \ quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQATQuantizer \ quantizer.groupsize=256 ``` Quantize: ``` tune run quantize --config quantization \ model._component_=torchtune.models.qwen3.qwen3_1_7b_instruct \ checkpointer._component_=torchtune.training.FullModelHFCheckpointer \ checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0 \ checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0_out \ 'checkpointer.checkpoint_files=[model-00001-of-00002.safetensors,model-00002-of-00002.safetensors]' \ checkpointer.model_type=QWEN3 \ tokenizer._component_=torchtune.models.qwen3.qwen3_tokenizer \ tokenizer.path=/tmp/Qwen3-1.7B/vocab.json \ tokenizer.merges_file=/tmp/Qwen3-1.7B/merges.txt \ quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \ quantizer.groupsize=256 ``` Eval: ``` tune run eleuther_eval --config eleuther_evaluation \ batch_size=1 \ 'tasks=[hellaswag,wikitext]' \ model._component_=torchtune.models.qwen3.qwen3_1_7b_instruct \ checkpointer._component_=torchtune.training.FullModelTorchTuneCheckpointer \ checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0 \ checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0_out \ 'checkpointer.checkpoint_files=[model-00001-of-00002-8da4w.ckpt]' \ checkpointer.model_type=QWEN3 \ tokenizer._component_=torchtune.models.qwen3.qwen3_tokenizer \ tokenizer.path=/tmp/Qwen3-1.7B/vocab.json \ tokenizer.merges_file=/tmp/Qwen3-1.7B/merges.txt \ quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \ quantizer.groupsize=256 ``` Results: ``` experiment_name tok/s peak_mem_active peak_mem_alloc peak_mem_reserved ------------------- ------------------- ----------------- ---------------- ------------------- Qwen3-1.7B_full 5687.638 (+0.000%) 7.009 (+0.000%) 7.009 (+0.000%) 11.075 (+0.000%) Qwen3-1.7B_qat 2569.197 (-54.828%) 7.394 (+5.496%) 7.394 (+5.496%) 12.559 (+13.398%) experiment_name hellaswag_acc wikitext_word_perplexity ------------------- ------------------------------ ------------------------------- Qwen3-1.7B_full 0.370 quant, 0.449 float 140.294 quant, 29.461 float Qwen3-1.7B_qat 0.406 quant, recovered 44.753% 48.768 quant, recovered 82.580% ```
Summary: Similar to #1854. Update
qat_distributedrecipe to mirrorfull_finetune_distributedup until a6db644. The new major feature that is excluded fromqat_distributedis FP8 finetuning (#2546), since QAT FP8 is not supported in torchao yet.Diff between full finetune and QAT recipes: P1809370361
Test Plan:
Finetune:
Quantize:
Eval:
Results: