-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Fix for Squad Dataset Download #13893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
883da22
Fix for Squad Dataset Download
rhmukundan 9c3acd5
Giving the option to pass the sequence length from the finetune script
rhmukundan d4acb9c
Rebase Pushing llama4 finetuning e128 script and llama3 70b finetunin…
rhmukundan fec6d78
Finetune Llama4 Recipe with dataset download fix
rhmukundan 8ee14a9
Address PR comments
rhmukundan fdb5487
Tweaks to finetune_llama4_e128
rhmukundan adbb91d
Addressing PR comments
rhmukundan e27360c
Giving an option to have either AutoTokenizer or NullTokenizer for pr…
rhmukundan 794e1b2
Fix kwargs
rhmukundan 619b167
User passing vocab_size while using the NullTokenizer for downloading…
rhmukundan 04e2364
Adding model configs for finetune llama4
rhmukundan 830dce5
Rebase Introducing the fix to llama3 finetuning recipes as well
rhmukundan 9976083
Setting default vocab_size to None in prepare_squad_dataset_experimen…
rhmukundan d133fd8
Fix merge conflicts
rhmukundan 452399d
Fixing the search condition for the dataset
rhmukundan 3deddf4
Apply isort and black reformatting
rhmukundan 766f71f
Removing NullTokenizer from Finetuning scripts
rhmukundan a95713b
Import cleanup
rhmukundan 0673cd2
Apply isort and black reformatting
rhmukundan 777b5ed
Merge branch 'main' into rhmukundan/fix-squad-dataset-download
rhmukundan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,209 @@ | ||
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from os.path import basename, splitext | ||
|
|
||
| import nemo_run as run | ||
|
|
||
| from nemo.collections.llm.recipes.llama4_e128 import finetune_recipe, model | ||
| from nemo.collections.llm.recipes.precision.mixed_precision import bf16_with_fp8_mixed | ||
| from nemo.lightning.run.plugins import MemoryProfilePlugin, NsysPlugin, PerfEnvPlugin | ||
|
|
||
| from ..argument_parser import parse_cli_args | ||
| from ..utils import ( | ||
| args_sanity_check, | ||
| get_user_configs, | ||
| hf_tokenizer, | ||
| import_ckpt_experiment, | ||
| prepare_squad_dataset_experiment, | ||
| set_exp_logging_configs, | ||
| set_primary_perf_configs, | ||
| slurm_executor, | ||
| ) | ||
|
|
||
| HF_MODEL_URI = "meta-llama/Llama-4-Maverick-17B-128E-Instruct" | ||
|
|
||
| # Set this to True if checkpoint is available at 'NEMO_HOME'. If set to False, | ||
| # extra Slurm job will be scheduled. In this case, if checkpoint is available | ||
| # at 'NEMO_HOME', fine-tuning job will use this checkpoint, else, it will be | ||
| # downloaded from HuggingFace | ||
| SKIP_IMPORT = False | ||
|
|
||
| # Set this to True if dataset is already downloaded. If set to False, | ||
| # dataset will be downloaded from HuggingFace | ||
| SKIP_DATASET_DOWNLOAD = False | ||
|
|
||
|
|
||
| def override_recipe_configs( | ||
| args: str, | ||
| num_nodes: int, | ||
| mbs: int, | ||
| gbs: int, | ||
| tp_size: int, | ||
| pp_size: int, | ||
| cp_size: int, | ||
| vp_size: int, | ||
| ep_size: int, | ||
| etp_size: int, | ||
| enable_cuda_graphs: bool, | ||
| use_mcore_fsdp: bool, | ||
| recompute_layers: int, | ||
| activation_offload_layers: int, | ||
| ): | ||
| """ | ||
| Llama4 e128 fine-tuning recipe aimed at achieving best possible performance. | ||
|
|
||
| NOTE: Use fp8 precision training with caution. It might not give desirable results. | ||
| """ | ||
| finetuning_scheme = "none" if args.finetuning == "sft" else args.finetuning | ||
|
|
||
| recipe = finetune_recipe(peft_scheme=finetuning_scheme, performance_mode=True, packed_sequence=True) | ||
|
|
||
| recipe = set_primary_perf_configs( | ||
| recipe, | ||
| finetuning_scheme, | ||
| num_nodes, | ||
| args.gpus_per_node, | ||
| mbs, | ||
| gbs, | ||
| args.max_steps, | ||
| tp_size, | ||
| pp_size, | ||
| cp_size, | ||
| vp_size, | ||
| ep_size, | ||
| etp_size, | ||
| enable_cuda_graphs=enable_cuda_graphs, | ||
| use_mcore_fsdp=use_mcore_fsdp, | ||
| recompute_layers=recompute_layers, | ||
| activation_offload_layers=activation_offload_layers, | ||
| compute_dtype=args.compute_dtype, | ||
| fp8_recipe=args.fp8_recipe, | ||
| ) | ||
|
|
||
| recipe = set_exp_logging_configs( | ||
| recipe, | ||
| finetuning_scheme, | ||
| "llm", | ||
| "llama4", | ||
| args.tensorboard, | ||
| args.wandb, | ||
| args.wandb_prj_name, | ||
| args.wandb_job_name, | ||
| ) | ||
|
|
||
| # data module configs | ||
| recipe.data.tokenizer = hf_tokenizer(HF_MODEL_URI) | ||
|
|
||
| # Compute dtype configs | ||
| if args.compute_dtype.lower() == "fp8": | ||
| recipe.trainer.plugins = bf16_with_fp8_mixed() | ||
| recipe.trainer.plugins.grad_reduce_in_fp32 = False | ||
|
|
||
| recipe.model.config.cross_entropy_fusion_impl = "te" | ||
| recipe.model.config.cross_entropy_loss_fusion = True | ||
| recipe.model.config.apply_rope_fusion = True | ||
| recipe.model.config.moe_permute_fusion = True | ||
| return recipe | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| args = parse_cli_args().parse_args() | ||
| args_sanity_check(args) | ||
|
|
||
| kwargs = get_user_configs(args.gpu.lower(), "sft", "llama4", "e128", args) | ||
| ( | ||
| num_nodes, | ||
| mbs, | ||
| gbs, | ||
| tp_size, | ||
| pp_size, | ||
| cp_size, | ||
| vp_size, | ||
| ep_size, | ||
| etp_size, | ||
| enable_cuda_graphs, | ||
| use_mcore_fsdp, | ||
| recompute_layers, | ||
| activation_offload_layers, | ||
| ) = kwargs[0:13] | ||
|
|
||
| recipe = override_recipe_configs( | ||
| args, | ||
| num_nodes, | ||
| mbs, | ||
| gbs, | ||
| tp_size, | ||
| pp_size, | ||
| cp_size, | ||
| vp_size, | ||
| ep_size, | ||
| etp_size, | ||
| enable_cuda_graphs, | ||
| use_mcore_fsdp, | ||
| recompute_layers, | ||
| activation_offload_layers, | ||
| ) | ||
| exp_config = ( | ||
| f"{num_nodes}nodes_tp{tp_size}_pp{pp_size}_cp{cp_size}_vp{vp_size}_ep{ep_size}_etp{etp_size}_{mbs}mbs_{gbs}gbs" | ||
| ) | ||
| exp_name = f"{splitext(basename(__file__))[0]}_{args.compute_dtype}_{exp_config}" | ||
|
|
||
| plugins = [ | ||
| PerfEnvPlugin( | ||
| enable_vboost=True, | ||
| nccl_pp_comm_chunksize=2097152 if pp_size > 1 else None, | ||
| gpu_sm100_or_newer=(args.gpu.lower() in ['b200', 'gb200']), | ||
| ) | ||
| ] | ||
|
|
||
| if args.enable_nsys: | ||
| plugins.append(NsysPlugin(start_step=5, end_step=6)) | ||
| if args.enable_memory_profile: | ||
| assert args.memory_profile_out_path is not None | ||
| plugins.append(MemoryProfilePlugin(dir=args.memory_profile_out_path)) | ||
|
|
||
| executor = slurm_executor( | ||
| args.account, | ||
| args.partition, | ||
| args.log_dir, | ||
| num_nodes, | ||
| args.gpus_per_node, | ||
| args.time_limit, | ||
| args.container_image, | ||
| custom_mounts=args.custom_mounts, | ||
| custom_env_vars={}, | ||
| hf_token=args.hf_token, | ||
| nemo_home=args.nemo_home, | ||
| wandb_key=args.wandb_key, | ||
| ) | ||
|
|
||
| with run.Experiment(exp_name) as exp: | ||
| if not SKIP_IMPORT: | ||
| assert args.hf_token is not None, "HF token is required for importing checkpoint from HuggingFace" | ||
| exp.add(*import_ckpt_experiment(executor, model(), source=f"hf://{HF_MODEL_URI}")) | ||
| if not SKIP_DATASET_DOWNLOAD: | ||
| exp.add( | ||
| *prepare_squad_dataset_experiment(executor, HF_MODEL_URI, seq_length=4096, nemo_home=args.nemo_home) | ||
| ) | ||
| exp.add( | ||
| recipe, | ||
| executor=executor, | ||
| name=exp_name, | ||
| plugins=plugins, | ||
| ) | ||
| if not args.dryrun: | ||
| exp.run(sequential=True, detach=True) | ||
| else: | ||
| exp.dryrun() |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not very clear; previously without this PR , are we not downloading the dataset from huggingface? My impression is it's still done somewhere in the dataset building process, just not explicitly, right? I think the difference is here you are separating it out as a new nemo-run experiment. If comment like this, users would think setting it to True without a local file will error out, but in reality it won't?
Could you further explain what different things are happening between False and True here;