Skip to content

feat: Enable LoRA checkpoint utils for ScatterMoE #523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

willmj
Copy link
Collaborator

@willmj willmj commented Apr 8, 2025

Description of the change

PR to be merged in after fms-acceleration changes

Enables checkpoint utils for ScatterMoE on LoRA tuned models to convert them to their original structure.

Related issue number

How to verify the PR

python -m pytest tests/test_sft_trainer.py::test_run_moe_lora_and_inference
=========================================================================================== test session starts ============================================================================================
platform linux -- Python 3.12.5, pytest-8.3.5, pluggy-1.5.0
rootdir: /app/fms-hf-tuning
configfile: pytest.ini
plugins: typeguard-4.4.1
collected 1 item                                                                                                                                                                                           

tests/test_sft_trainer.py .                                                                                                                                                                          [100%]

============================================================================================= warnings summary =============================================================================================
tests/test_sft_trainer.py::test_run_moe_lora_and_inference[/app/fms-hf-tuning/tests/artifacts/testdata/jsonl/twitter_complaints_small.jsonl]
  /app/fms-hf-tuning/tuning/config/acceleration_configs/acceleration_framework_config.py:297: UserWarning: An experimental acceleration feature is requested by specifying the '--fast_moe' argument. Please note this feature may not support certain edge cases at this juncture. When the feature matures this message will be turned off.
    warnings.warn(

tests/test_sft_trainer.py::test_run_moe_lora_and_inference[/app/fms-hf-tuning/tests/artifacts/testdata/jsonl/twitter_complaints_small.jsonl]
  /app/fms-hf-tuning/tuning/sft_trainer.py:349: FutureWarning: `tokenizer` is deprecated and removed starting from version 0.16.0 for `SFTTrainer.__init__`. Use `processing_class` instead.
    trainer = TrainerClass(

tests/test_sft_trainer.py::test_run_moe_lora_and_inference[/app/fms-hf-tuning/tests/artifacts/testdata/jsonl/twitter_complaints_small.jsonl]
  /home/tuning/.local/lib/python3.12/site-packages/datasets/utils/_dill.py:385: DeprecationWarning: co_lnotab is deprecated, use co_lines instead.
    obj.co_lnotab,  # for < python 3.10 [not counted in args]

tests/test_sft_trainer.py::test_run_moe_lora_and_inference[/app/fms-hf-tuning/tests/artifacts/testdata/jsonl/twitter_complaints_small.jsonl]
  /home/tuning/.local/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:300: UserWarning: You passed a processing_class with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `processing_class.padding_side = 'right'` to your code.
    warnings.warn(

tests/test_sft_trainer.py::test_run_moe_lora_and_inference[/app/fms-hf-tuning/tests/artifacts/testdata/jsonl/twitter_complaints_small.jsonl]
tests/test_sft_trainer.py::test_run_moe_lora_and_inference[/app/fms-hf-tuning/tests/artifacts/testdata/jsonl/twitter_complaints_small.jsonl]
tests/test_sft_trainer.py::test_run_moe_lora_and_inference[/app/fms-hf-tuning/tests/artifacts/testdata/jsonl/twitter_complaints_small.jsonl]
tests/test_sft_trainer.py::test_run_moe_lora_and_inference[/app/fms-hf-tuning/tests/artifacts/testdata/jsonl/twitter_complaints_small.jsonl]
tests/test_sft_trainer.py::test_run_moe_lora_and_inference[/app/fms-hf-tuning/tests/artifacts/testdata/jsonl/twitter_complaints_small.jsonl]
  /home/tuning/.local/lib/python3.12/site-packages/peft/utils/save_and_load.py:257: UserWarning: Setting `save_embedding_layers` to `True` as the embedding layer has been resized during finetuning.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================================================== 1 passed, 9 warnings in 32.15s ======================================================================================

Was the PR tested

  • I have added >=1 unit test(s) for every new method I have added.
  • I have ensured all unit tests pass

willmj added 9 commits March 24, 2025 13:39
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
@willmj willmj marked this pull request as draft April 8, 2025 18:14
Copy link

github-actions bot commented Apr 8, 2025

Thanks for making a pull request! 😃
One of the maintainers will review and advise on the next steps.

@github-actions github-actions bot added the feat label Apr 8, 2025
willmj added 6 commits April 8, 2025 15:40
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
@willmj willmj marked this pull request as ready for review April 11, 2025 20:23
Signed-off-by: Will Johnson <[email protected]>
@dushyantbehl dushyantbehl requested review from dushyantbehl and removed request for aluu317, fabianlim, Ssukriti and anhuong April 15, 2025 12:13
willmj added 2 commits April 15, 2025 09:29
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
- lora tuning with ScatterMoE is supported, but because of inference restrictions on vLLM/vanilla PEFT, experts should not be trained as `target_modules` for models being tuned with ScatterMoE. Users have control over which `target_modules` they wish to train:
- Passing `all-linear` to adapter layers will include the router, which is a linear layer, and all attn layers. This **will not** train the expert layers.
- To train only attention layers, specify target modules specifically (i.e `target_modules: ["q_proj", "v_proj", "o_proj", "k_proj"]`).
- To train expert layers, specify `input_linear` and `output_linear` in target modules along with `router` (i.e `target_modules: ["q_proj", "v_proj", "o_proj", "k_proj", "router", "input_linear", "output_linear"]`). If you specify these layers, inference with vLLM/vanilla HF PEFT **is not possible**.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- To train expert layers, specify `input_linear` and `output_linear` in target modules along with `router` (i.e `target_modules: ["q_proj", "v_proj", "o_proj", "k_proj", "router", "input_linear", "output_linear"]`). If you specify these layers, inference with vLLM/vanilla HF PEFT **is not possible**.
- To train expert layers, specify `input_linear` and `output_linear` in target modules along with `router` (i.e `target_modules: ["q_proj", "v_proj", "o_proj", "k_proj", "router", "input_linear", "output_linear"]`). If you specify these layers, inference with vLLM/vanilla HF PEFT **is not currently supported.**.

- Passing `all-linear` to adapter layers will include the router, which is a linear layer, and all attn layers. This **will not** train the expert layers.
- To train only attention layers, specify target modules specifically (i.e `target_modules: ["q_proj", "v_proj", "o_proj", "k_proj"]`).
- To train expert layers, specify `input_linear` and `output_linear` in target modules along with `router` (i.e `target_modules: ["q_proj", "v_proj", "o_proj", "k_proj", "router", "input_linear", "output_linear"]`). If you specify these layers, inference with vLLM/vanilla HF PEFT **is not possible**.
- When lora tuning with ScatterMoE, the values `--fast_moe 1` or `--fast_moe True` are not expected to work, as FSDP must be enabled when lora tuning. Run either `--fast_moe False` or `--fast-moe x>1`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didnt get your point quite yet here. --fast_moe True disables expert parallel however, experts are sharded by FSDP. So FSDP is active here.

BTW, --fast_moe 1 and --fast_moe False Both have the same effect isn't it? In both the settings, all experts are replicated and deferred from FSDP however, other layers are under FSDP sharding.

May be if you are confortable with a support matrix table, lets do that and pin point case by case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants