Apply packed sequence params change for fused rope compatibility by ananthsub · Pull Request #11506 · NVIDIA-NeMo/NeMo

ananthsub · 2024-12-07T02:57:18Z

What does this PR do ?

Compatibility with NVIDIA/Megatron-LM@210162a

Collection: nlp

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

ericharper

LGTM. Thanks!

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

Signed-off-by: ananthsub <ananthsub@users.noreply.github.com>

nemo/collections/llm/api.py

ananthsub · 2024-12-09T07:25:29Z

tests/collections/llm/gpt_finetuning.py

+        # Pipeline dtype is coupled with the bf16 mixed precision plugin
+        pipeline_dtype=torch.bfloat16,


fyi @hemildesai @BoxiangW for the parallelism setting refactor. this requirement is coming after #10954 which validates that the pipeline dtype is now set here.

there are multiple paths to set this:

either on the megatron strategy directly

via the precision plugin

setting it in multiple places feels wrong, especially since users have to make 2 hops in the codebase to figure this out:

https://github.com/NVIDIA/NeMo/blob/bde672e75f1ac45ead08e2b977920a28eb81448e/nemo/lightning/pytorch/strategies/megatron_strategy.py#L288-L290

https://github.com/NVIDIA/NeMo/blob/bde672e75f1ac45ead08e2b977920a28eb81448e/nemo/lightning/pytorch/plugins/mixed_precision.py#L107-L113C46

I think a better way than hardcoding pipeline_dtype is to make it a function attribute and set it's value if it's used https://github.com/NVIDIA/NeMo/pull/11504/files#diff-78f81f4094cfea056c177e87c0d527b9ce27cee11813138e5a2a69370b922c19R282

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

nemo/collections/llm/api.py

github-actions · 2024-12-09T07:36:17Z

beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base.

Your code was analyzed with PyLint. The following annotations have been identified:

************* Module nemo.collections.llm.api
nemo/collections/llm/api.py:604:0: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.collections.nlp.modules.common.megatron.adapters.mcore_mixins
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:140:0: C0301: Line too long (120/119) (line-too-long)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:142:0: C0301: Line too long (147/119) (line-too-long)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:60:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:69:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:76:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:108:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:226:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:333:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:359:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:443:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:450:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:474:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:481:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:499:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/nlp/modules/common/megatron/adapters/mcore_mixins.py:506:4: C0116: Missing function or method docstring (missing-function-docstring)

-----------------------------------
Your code has been rated at 9.66/10

Thank you for improving NeMo's documentation!

…DIA-NeMo#11506) * Apply packed sequence params change for fused rope compatibility Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com> * fix lint Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com> --------- Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

github-actions bot added the NLP label Dec 7, 2024

Apply packed sequence params change for fused rope compatibility

3346acb

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

ananthsub force-pushed the fix-packed-attention-2201 branch from 5aa617c to 3346acb Compare December 7, 2024 03:03

fix lint

055f9e1

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

ananthsub requested a review from ericharper December 9, 2024 04:45

ericharper previously approved these changes Dec 9, 2024

View reviewed changes

ericharper added the Run CICD label Dec 9, 2024

ananthsub enabled auto-merge (squash) December 9, 2024 05:07

fix checks for recipe validation

4920196

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

ananthsub dismissed ericharper’s stale review via 4920196 December 9, 2024 07:21

Apply isort and black reformatting

9eca847

Signed-off-by: ananthsub <ananthsub@users.noreply.github.com>

ananthsub commented Dec 9, 2024

View reviewed changes

ericharper added Run CICD and removed Run CICD labels Dec 9, 2024

fix checks

8b370dc

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

ananthsub commented Dec 9, 2024

View reviewed changes

nemo/collections/llm/api.py Show resolved Hide resolved

ericharper approved these changes Dec 9, 2024

View reviewed changes

ericharper added Run CICD and removed Run CICD labels Dec 9, 2024

ananthsub merged commit 04b3a00 into NVIDIA-NeMo:main Dec 9, 2024

ananthsub deleted the fix-packed-attention-2201 branch December 9, 2024 16:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply packed sequence params change for fused rope compatibility#11506

Apply packed sequence params change for fused rope compatibility#11506
ananthsub merged 5 commits intoNVIDIA-NeMo:mainfrom
ananthsub:fix-packed-attention-2201

ananthsub commented Dec 7, 2024 •

edited

Loading

Uh oh!

ericharper left a comment

Uh oh!

Uh oh!

ananthsub Dec 9, 2024 •

edited

Loading

Uh oh!

akoumpa Dec 9, 2024

Uh oh!

Uh oh!

github-actions bot commented Dec 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# Pipeline dtype is coupled with the bf16 mixed precision plugin
		pipeline_dtype=torch.bfloat16,

Conversation

ananthsub commented Dec 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

ericharper left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ananthsub Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akoumpa Dec 9, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Dec 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ananthsub commented Dec 7, 2024 •

edited

Loading

ananthsub Dec 9, 2024 •

edited

Loading