Skip to content

Conversation

@andrewor14
Copy link
Contributor

@andrewor14 andrewor14 commented May 12, 2025

Summary: Similar to #1854. Update qat_distributed recipe to mirror full_finetune_distributed up until a6db644. The new major feature that is excluded from qat_distributed is FP8 finetuning (#2546), since QAT FP8 is not supported in torchao yet.

Diff between full finetune and QAT recipes: P1809370361

diff --color recipes/full_finetune_distributed.py recipes/qat_distributed.py

Test Plan:

Finetune:

tune run --nnodes 1 --nproc_per_node 4 qat_distributed --config llama3_2/3B_qat_full \
    epochs=1 \
    batch_size=16 \
    dataset._component_=torchtune.datasets.alpaca_cleaned_dataset \
    checkpointer.output_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat \
    output_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat/metrics \
    metric_logger.log_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat/metrics \
    quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQATQuantizer \
    quantizer.groupsize=32

Quantize:

tune run quantize --config quantization \
    model._component_=torchtune.models.llama3_2.llama3_2_3b \
    checkpointer._component_=torchtune.training.FullModelHFCheckpointer \
    checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat/epoch_0 \
    checkpointer.output_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat/epoch_0_out \
    'checkpointer.checkpoint_files=[model-00001-of-00002.safetensors,model-00002-of-00002.safetensors]' \
    checkpointer.model_type=LLAMA3 \
    quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \
    quantizer.groupsize=32

Eval:

tune run eleuther_eval --config eleuther_evaluation \
    batch_size=1 \
    'tasks=[wikitext]' \
    model._component_=torchtune.models.llama3_2.llama3_2_3b \
    checkpointer._component_=torchtune.training.FullModelTorchTuneCheckpointer \
    checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat/epoch_0 \
    checkpointer.output_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat/epoch_0_out \
    'checkpointer.checkpoint_files=[model-00001-of-00002-8da4w.ckpt]' \
    checkpointer.model_type=LLAMA3 \
    tokenizer._component_=torchtune.models.llama3.llama3_tokenizer \
    tokenizer.path=/tmp/Meta-Llama-3-8B-Instruct/original/tokenizer.model \
    quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \
    quantizer.groupsize=32

Results:

experiment_name          tok/s                peak_mem_active    peak_mem_alloc    peak_mem_reserved
-----------------------  -------------------  -----------------  ----------------  -------------------
Llama3.2-3B_alpaca_full  4677.163 (+0.000%)   12.261 (+0.000%)   12.261 (+0.000%)  15.778 (+0.000%)
Llama3.2-3B_alpaca_qat   1873.316 (-59.948%)  13.047 (+6.409%)   13.047 (+6.409%)  17.226 (+9.176%)

experiment_name          hellaswag_acc                   wikitext_word_perplexity
-----------------------  ------------------------------  -------------------------------
Llama3.2-3B_alpaca_full  0.470 quant, 0.534 float        18.563 quant, 12.364 float
Llama3.2-3B_alpaca_qat   0.511 quant, recovered 63.043%  13.792 quant, recovered 76.962%

@pytorch-bot
Copy link

pytorch-bot bot commented May 12, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2721

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f0cefe6 with merge base a6db644 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 12, 2025
**Summary:** Similar to meta-pytorch#1854. Update `qat_distributed` recipe to mirror `full_finetune_distributed` up until a6db644. The new major feature that is excluded from `qat_distributed` is FP8 finetuning (meta-pytorch#2546), since QAT FP8 is not supported in torchao yet.

Diff between full finetune and QAT recipes: P1809370361
```
diff --color recipes/full_finetune_distributed.py recipes/qat_distributed.py
```

**Test Plan:**

Finetune:
```
tune run --nnodes 1 --nproc_per_node 4 qat_distributed --config llama3_2/3B_qat_full \
    epochs=1 \
    batch_size=16 \
    dataset._component_=torchtune.datasets.alpaca_cleaned_dataset \
    checkpointer.output_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat \
    output_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat/metrics \
    metric_logger.log_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat/metrics \
    quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQATQuantizer \
    quantizer.groupsize=32
```

Quantize:
```
tune run quantize --config quantization \
    model._component_=torchtune.models.llama3_2.llama3_2_3b \
    checkpointer._component_=torchtune.training.FullModelHFCheckpointer \
    checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat/epoch_0 \
    checkpointer.output_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat/epoch_0_out \
    'checkpointer.checkpoint_files=[model-00001-of-00002.safetensors,model-00002-of-00002.safetensors]' \
    checkpointer.model_type=LLAMA3 \
    quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \
    quantizer.groupsize=32
```

Eval:
```
tune run eleuther_eval --config eleuther_evaluation \
    batch_size=1 \
    'tasks=[wikitext]' \
    model._component_=torchtune.models.llama3_2.llama3_2_3b \
    checkpointer._component_=torchtune.training.FullModelTorchTuneCheckpointer \
    checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat/epoch_0 \
    checkpointer.output_dir=/home/andrewor/local/logs/tune/Llama3.2-3B_alpaca_qat/epoch_0_out \
    'checkpointer.checkpoint_files=[model-00001-of-00002-8da4w.ckpt]' \
    checkpointer.model_type=LLAMA3 \
    tokenizer._component_=torchtune.models.llama3.llama3_tokenizer \
    tokenizer.path=/tmp/Meta-Llama-3-8B-Instruct/original/tokenizer.model \
    quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \
    quantizer.groupsize=32
```

Results:
```
experiment_name          tok/s                peak_mem_active    peak_mem_alloc    peak_mem_reserved
-----------------------  -------------------  -----------------  ----------------  -------------------
Llama3.2-3B_alpaca_full  4677.163 (+0.000%)   12.261 (+0.000%)   12.261 (+0.000%)  15.778 (+0.000%)
Llama3.2-3B_alpaca_qat   1873.316 (-59.948%)  13.047 (+6.409%)   13.047 (+6.409%)  17.226 (+9.176%)

experiment_name          hellaswag_acc                   wikitext_word_perplexity
-----------------------  ------------------------------  -------------------------------
Llama3.2-3B_alpaca_full  0.470 quant, 0.534 float        18.563 quant, 12.364 float
Llama3.2-3B_alpaca_qat   0.511 quant, recovered 63.043%  13.792 quant, recovered 76.962%
```
@andrewor14 andrewor14 force-pushed the update_qat_5_12_25 branch from 0bb8049 to f0cefe6 Compare May 12, 2025 18:04
@andrewor14
Copy link
Contributor Author

@ebsmothers @joecummings

Copy link
Member

@joecummings joecummings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're the best, thank you!

@ebsmothers ebsmothers merged commit 541c730 into meta-pytorch:main May 13, 2025
14 checks passed
andrewor14 added a commit to andrewor14/torchtune that referenced this pull request May 16, 2025
**Summary:** Similar to meta-pytorch#2721.
Update `qat_lora_finetune_distributed` recipe to mirror
`lora_finetune_distributed` up until 0991f97.

Diff between lora finetune and QAT lora recipes:

```
diff --color recipes/lora_finetune_distributed.py recipes/qat_lora_finetune_distributed.py
```

**Test Plan:** TBD
andrewor14 added a commit to andrewor14/torchtune that referenced this pull request Jun 26, 2025
**Summary:** Similar to meta-pytorch#2721.
Update `qat_lora_finetune_distributed` recipe to mirror
`lora_finetune_distributed` up until 3d73591.

Diff between lora finetune and QAT lora recipes:

```
diff --color recipes/lora_finetune_distributed.py recipes/qat_lora_finetune_distributed.py
```

**Test Plan:** TBD
andrewor14 added a commit to andrewor14/torchtune that referenced this pull request Jun 30, 2025
**Summary:** Similar to meta-pytorch#2721.
Update `qat_lora_finetune_distributed` recipe to mirror
`lora_finetune_distributed` up until 3d73591.

Diff between lora finetune and QAT lora recipes:

```
diff --color recipes/lora_finetune_distributed.py recipes/qat_lora_finetune_distributed.py
```

**Test Plan:** TBD
andrewor14 added a commit to andrewor14/torchtune that referenced this pull request Jun 30, 2025
**Summary:** Similar to meta-pytorch#2721.
Update `qat_distributed` recipe to mirror
`full_finetune_distributed` up until 3d73591.

Diff between full finetune and QAT recipes:

```
diff --color recipes/full_finetune_distributed.py recipes/qat_distributed.py
```

**Test Plan:** TBD
andrewor14 added a commit to andrewor14/torchtune that referenced this pull request Jul 1, 2025
**Summary:** Similar to meta-pytorch#2721.
Update `qat_lora_finetune_distributed` recipe to mirror
`lora_finetune_distributed` up until 371bb0b.

Diff between lora finetune and QAT lora recipes:

```
diff --color recipes/lora_finetune_distributed.py recipes/qat_lora_finetune_distributed.py
```

**Test Plan:** TBD
andrewor14 added a commit to andrewor14/torchtune that referenced this pull request Jul 1, 2025
**Summary:** Similar to meta-pytorch#2721.
Update `qat_distributed` recipe to mirror
`full_finetune_distributed` up until 371bb0b.

Diff between full finetune and QAT recipes:

```
diff --color recipes/full_finetune_distributed.py recipes/qat_distributed.py
```

**Test Plan:** TBD
andrewor14 added a commit to andrewor14/torchtune that referenced this pull request Jul 1, 2025
**Summary:** Similar to meta-pytorch#2721.
Update `qat_lora_finetune_distributed` recipe to mirror
`lora_finetune_distributed` up until 371bb0b.

Diff between lora finetune and QAT lora recipes:

```
diff --color recipes/lora_finetune_distributed.py recipes/qat_lora_finetune_distributed.py
```

**Test Plan:**

Fine-tune:
```
tune run --nnodes 1 --nproc_per_node 4 qat_lora_finetune_distributed --config qwen3/1.7B_qat_lora \
    epochs=1 \
    batch_size=16 \
    dataset._component_=torchtune.datasets.alpaca_cleaned_dataset \
    checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat \
    output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/metrics \
    metric_logger.log_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/metrics \
    quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQATQuantizer \
    quantizer.groupsize=256
```

Quantize:
```
tune run quantize --config quantization \
    model._component_=torchtune.models.qwen3.lora_qwen3_1_7b_instruct \
    checkpointer._component_=torchtune.training.FullModelHFCheckpointer \
    checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0 \
    checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0_out \
    'checkpointer.checkpoint_files=[model-00001-of-00002.safetensors,model-00002-of-00002.safetensors]' \
    checkpointer.model_type=QWEN3 \
    tokenizer._component_=torchtune.models.qwen3.qwen3_tokenizer \
    tokenizer.path=/tmp/Qwen3-1.7B/vocab.json \
    tokenizer.merges_file=/tmp/Qwen3-1.7B/merges.txt \
    quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \
    quantizer.groupsize=256
```

Eval:
```
tune run eleuther_eval --config eleuther_evaluation \
    batch_size=1 \
    'tasks=[hellaswag,wikitext]' \
    model._component_=torchtune.models.qwen3.lora_qwen3_1_7b_instruct \
    checkpointer._component_=torchtune.training.FullModelTorchTuneCheckpointer \
    checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0 \
    checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0_out \
    'checkpointer.checkpoint_files=[model-00001-of-00002-8da4w.ckpt]' \
    checkpointer.model_type=QWEN3 \
    tokenizer._component_=torchtune.models.qwen3.qwen3_tokenizer \
    tokenizer.path=/tmp/Qwen3-1.7B/vocab.json \
    tokenizer.merges_file=/tmp/Qwen3-1.7B/merges.txt \
    quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \
    quantizer.groupsize=256
```

Results:
```
experiment_name      tok/s                peak_mem_active    peak_mem_alloc    peak_mem_reserved
-------------------  -------------------  -----------------  ----------------  -------------------
Qwen3-1.7B_full      5687.638 (+0.000%)   7.009 (+0.000%)    7.009 (+0.000%)   11.075 (+0.000%)
Qwen3-1.7B_qat_lora  2812.026 (-50.559%)  5.945 (-15.177%)   5.945 (-15.177%)  10.146 (-8.390%)

experiment_name      hellaswag_acc                   wikitext_word_perplexity
-------------------  ------------------------------  -------------------------------
Qwen3-1.7B_full      0.370 quant, 0.449 float        140.294 quant, 29.461 float
Qwen3-1.7B_qat_lora  0.421 quant, recovered 64.602%  46.755 quant, recovered 84.396%
```
andrewor14 added a commit to andrewor14/torchtune that referenced this pull request Jul 1, 2025
**Summary:** Similar to meta-pytorch#2721.
Update `qat_distributed` recipe to mirror
`full_finetune_distributed` up until 371bb0b.

Diff between full finetune and QAT recipes:

```
diff --color recipes/full_finetune_distributed.py recipes/qat_distributed.py
```

**Test Plan:**

Fine-tune:
```
tune run --nnodes 1 --nproc_per_node 4 qat_distributed --config qwen3/1.7B_qat_full \
    epochs=1 \
    batch_size=16 \
    dataset._component_=torchtune.datasets.alpaca_cleaned_dataset \
    checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat \
    output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/metrics \
    metric_logger.log_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/metrics \
    quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQATQuantizer \
    quantizer.groupsize=256
```

Quantize:
```
tune run quantize --config quantization \
    model._component_=torchtune.models.qwen3.qwen3_1_7b_instruct \
    checkpointer._component_=torchtune.training.FullModelHFCheckpointer \
    checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0 \
    checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0_out \
    'checkpointer.checkpoint_files=[model-00001-of-00002.safetensors,model-00002-of-00002.safetensors]' \
    checkpointer.model_type=QWEN3 \
    tokenizer._component_=torchtune.models.qwen3.qwen3_tokenizer \
    tokenizer.path=/tmp/Qwen3-1.7B/vocab.json \
    tokenizer.merges_file=/tmp/Qwen3-1.7B/merges.txt \
    quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \
    quantizer.groupsize=256
```

Eval:
```
tune run eleuther_eval --config eleuther_evaluation \
    batch_size=1 \
    'tasks=[hellaswag,wikitext]' \
    model._component_=torchtune.models.qwen3.qwen3_1_7b_instruct \
    checkpointer._component_=torchtune.training.FullModelTorchTuneCheckpointer \
    checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0 \
    checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0_out \
    'checkpointer.checkpoint_files=[model-00001-of-00002-8da4w.ckpt]' \
    checkpointer.model_type=QWEN3 \
    tokenizer._component_=torchtune.models.qwen3.qwen3_tokenizer \
    tokenizer.path=/tmp/Qwen3-1.7B/vocab.json \
    tokenizer.merges_file=/tmp/Qwen3-1.7B/merges.txt \
    quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \
    quantizer.groupsize=256
```

Results:
```
experiment_name      tok/s                peak_mem_active    peak_mem_alloc    peak_mem_reserved
-------------------  -------------------  -----------------  ----------------  -------------------
Qwen3-1.7B_full      5687.638 (+0.000%)   7.009 (+0.000%)    7.009 (+0.000%)   11.075 (+0.000%)
Qwen3-1.7B_qat       2569.197 (-54.828%)  7.394 (+5.496%)    7.394 (+5.496%)   12.559 (+13.398%)

experiment_name      hellaswag_acc                   wikitext_word_perplexity
-------------------  ------------------------------  -------------------------------
Qwen3-1.7B_full      0.370 quant, 0.449 float        140.294 quant, 29.461 float
Qwen3-1.7B_qat       0.406 quant, recovered 44.753%  48.768 quant, recovered 82.580%
```
andrewor14 added a commit to andrewor14/torchtune that referenced this pull request Jul 1, 2025
**Summary:** Similar to meta-pytorch#2721.
Update `qat_lora_finetune_distributed` recipe to mirror
`lora_finetune_distributed` up until 371bb0b.

Diff between lora finetune and QAT lora recipes:

```
diff --color recipes/lora_finetune_distributed.py recipes/qat_lora_finetune_distributed.py
```

**Test Plan:**

Fine-tune:
```
tune run --nnodes 1 --nproc_per_node 4 qat_lora_finetune_distributed --config qwen3/1.7B_qat_lora \
    epochs=1 \
    batch_size=16 \
    dataset._component_=torchtune.datasets.alpaca_cleaned_dataset \
    checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat \
    output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/metrics \
    metric_logger.log_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/metrics \
    quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQATQuantizer \
    quantizer.groupsize=256
```

Quantize:
```
tune run quantize --config quantization \
    model._component_=torchtune.models.qwen3.lora_qwen3_1_7b_instruct \
    checkpointer._component_=torchtune.training.FullModelHFCheckpointer \
    checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0 \
    checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0_out \
    'checkpointer.checkpoint_files=[model-00001-of-00002.safetensors,model-00002-of-00002.safetensors]' \
    checkpointer.model_type=QWEN3 \
    tokenizer._component_=torchtune.models.qwen3.qwen3_tokenizer \
    tokenizer.path=/tmp/Qwen3-1.7B/vocab.json \
    tokenizer.merges_file=/tmp/Qwen3-1.7B/merges.txt \
    quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \
    quantizer.groupsize=256
```

Eval:
```
tune run eleuther_eval --config eleuther_evaluation \
    batch_size=1 \
    'tasks=[hellaswag,wikitext]' \
    model._component_=torchtune.models.qwen3.lora_qwen3_1_7b_instruct \
    checkpointer._component_=torchtune.training.FullModelTorchTuneCheckpointer \
    checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0 \
    checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0_out \
    'checkpointer.checkpoint_files=[model-00001-of-00002-8da4w.ckpt]' \
    checkpointer.model_type=QWEN3 \
    tokenizer._component_=torchtune.models.qwen3.qwen3_tokenizer \
    tokenizer.path=/tmp/Qwen3-1.7B/vocab.json \
    tokenizer.merges_file=/tmp/Qwen3-1.7B/merges.txt \
    quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \
    quantizer.groupsize=256
```

Results:
```
experiment_name      tok/s                peak_mem_active    peak_mem_alloc    peak_mem_reserved
-------------------  -------------------  -----------------  ----------------  -------------------
Qwen3-1.7B_full      5687.638 (+0.000%)   7.009 (+0.000%)    7.009 (+0.000%)   11.075 (+0.000%)
Qwen3-1.7B_qat_lora  2812.026 (-50.559%)  5.945 (-15.177%)   5.945 (-15.177%)  10.146 (-8.390%)

experiment_name      hellaswag_acc                   wikitext_word_perplexity
-------------------  ------------------------------  -------------------------------
Qwen3-1.7B_full      0.370 quant, 0.449 float        140.294 quant, 29.461 float
Qwen3-1.7B_qat_lora  0.421 quant, recovered 64.602%  46.755 quant, recovered 84.396%
```
andrewor14 added a commit to andrewor14/torchtune that referenced this pull request Jul 1, 2025
**Summary:** Similar to meta-pytorch#2721.
Update `qat_distributed` recipe to mirror
`full_finetune_distributed` up until 371bb0b.

Diff between full finetune and QAT recipes:

```
diff --color recipes/full_finetune_distributed.py recipes/qat_distributed.py
```

**Test Plan:**

Fine-tune:
```
tune run --nnodes 1 --nproc_per_node 4 qat_distributed --config qwen3/1.7B_qat_full \
    epochs=1 \
    batch_size=16 \
    dataset._component_=torchtune.datasets.alpaca_cleaned_dataset \
    checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat \
    output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/metrics \
    metric_logger.log_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/metrics \
    quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQATQuantizer \
    quantizer.groupsize=256
```

Quantize:
```
tune run quantize --config quantization \
    model._component_=torchtune.models.qwen3.qwen3_1_7b_instruct \
    checkpointer._component_=torchtune.training.FullModelHFCheckpointer \
    checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0 \
    checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0_out \
    'checkpointer.checkpoint_files=[model-00001-of-00002.safetensors,model-00002-of-00002.safetensors]' \
    checkpointer.model_type=QWEN3 \
    tokenizer._component_=torchtune.models.qwen3.qwen3_tokenizer \
    tokenizer.path=/tmp/Qwen3-1.7B/vocab.json \
    tokenizer.merges_file=/tmp/Qwen3-1.7B/merges.txt \
    quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \
    quantizer.groupsize=256
```

Eval:
```
tune run eleuther_eval --config eleuther_evaluation \
    batch_size=1 \
    'tasks=[hellaswag,wikitext]' \
    model._component_=torchtune.models.qwen3.qwen3_1_7b_instruct \
    checkpointer._component_=torchtune.training.FullModelTorchTuneCheckpointer \
    checkpointer.checkpoint_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0 \
    checkpointer.output_dir=/home/andrewor/local/logs/tune/Qwen3-1.7B_alpaca_qat/epoch_0_out \
    'checkpointer.checkpoint_files=[model-00001-of-00002-8da4w.ckpt]' \
    checkpointer.model_type=QWEN3 \
    tokenizer._component_=torchtune.models.qwen3.qwen3_tokenizer \
    tokenizer.path=/tmp/Qwen3-1.7B/vocab.json \
    tokenizer.merges_file=/tmp/Qwen3-1.7B/merges.txt \
    quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer \
    quantizer.groupsize=256
```

Results:
```
experiment_name      tok/s                peak_mem_active    peak_mem_alloc    peak_mem_reserved
-------------------  -------------------  -----------------  ----------------  -------------------
Qwen3-1.7B_full      5687.638 (+0.000%)   7.009 (+0.000%)    7.009 (+0.000%)   11.075 (+0.000%)
Qwen3-1.7B_qat       2569.197 (-54.828%)  7.394 (+5.496%)    7.394 (+5.496%)   12.559 (+13.398%)

experiment_name      hellaswag_acc                   wikitext_word_perplexity
-------------------  ------------------------------  -------------------------------
Qwen3-1.7B_full      0.370 quant, 0.449 float        140.294 quant, 29.461 float
Qwen3-1.7B_qat       0.406 quant, recovered 44.753%  48.768 quant, recovered 82.580%
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants