Add batching support for evaluation by athitten · Pull Request #11934 · NVIDIA-NeMo/NeMo

athitten · 2025-01-23T04:41:24Z

What does this PR do ?

Adds batching support in loglikelihood method of nemo/collections/llm/evaluation/base.py and query's the model with batched inputs. Introduces batch_size arg in the evaluate (in nemo/collections/llm/api.py) method for batching. Also reintroduces max_tokens_to_generate as its required for generate_until type tasks (ex: gsm8k)
In nemo/export/tensorrt_llm.py file, adds a private method _pad_logits that performs padding for context logits when they are available, since in case of batched inputs the returned context_logits can have varying seq_len and we need padding to convert to numpy array.
Other minor edits for which relevant comments have been added.

Collection: [Note which collection this PR will affect]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

github-actions · 2025-01-23T21:05:27Z

[🤖]: Hi @athitten 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

Uses bool generation_logits_available as inputs dict does not contain it Signed-off-by: Abhishree <abhishreetm@gmail.com>

Signed-off-by: athitten <athitten@users.noreply.github.com>

Signed-off-by: Abhishree <abhishreetm@gmail.com>

athitten · 2025-01-25T21:16:44Z

nemo/collections/llm/evaluation/base.py

+                output_generation_logits = True
+            else:
+                # In case of multiple token prediction return the full context logits
+                output_context_logits = True


output_generation_logits and output_context_logits are false by default for generate_until type task as it needs only text and not logits. Modify them only if we are returning logits.

nemo/export/tensorrt_llm.py

athitten · 2025-01-25T21:21:02Z

nemo/collections/llm/evaluation/base.py

+        if re.match(mmlu_regex_pattern, requests[0].task_name):
+            # in case of mmlu the output token is one of 'a','b','c','d'
            single_prediction_token = True



Categorize only mmlu as single token prediction task since in case of lambada, although its a single token prediction task, the tokenized output can be multi tokens depending on how the tokenizer splits it. MMLU is multiple choice and output is always one of 'a','b','c','d'.

athitten · 2025-01-25T21:21:38Z

nemo/collections/llm/evaluation/base.py

-            self.max_tokens_to_generate = 1
-            # Delete the last token from continuation before passing it to the ip prompt by replacing with empty string
-            prompt = context + continuation.replace(self.tokenizer.tokenizer.decode(continuation_enc[-1]), "")
-            # Create payload to query the model deployed on PyTriton server


batching logic below.

athitten · 2025-01-25T21:23:40Z

nemo/collections/llm/evaluation/base.py

-    def _generate_tokens_logits(
-        self, payload, single_prediction_token, return_text: bool = False, return_logits: bool = False
-    ):
+    def _generate_tokens_logits(self, payload, single_prediction_token: bool = False, return_logits: bool = False):


Remove return_text arg, as its redundant and can be handled with just return_logits (False by default) and made True for loglikelihood tasks and remains false for generate_until type task.

Signed-off-by: Abhishree <abhishreetm@gmail.com>

Signed-off-by: athitten <athitten@users.noreply.github.com>

github-actions · 2025-01-25T21:33:13Z

beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base.

Your code was analyzed with PyLint. The following annotations have been identified:

************* Module nemo.collections.llm.api
nemo/collections/llm/api.py:379:0: C0301: Line too long (121/119) (line-too-long)
nemo/collections/llm/api.py:380:0: C0301: Line too long (120/119) (line-too-long)
nemo/collections/llm/api.py:591:0: C0116: Missing function or method docstring (missing-function-docstring)

-----------------------------------
Your code has been rated at 9.97/10

Mitigation guide:

Add sensible and useful docstrings to functions and methods
For trivial methods like getter/setters, consider adding # pylint: disable=C0116 inside the function itself
To disable multiple functions/methods at once, put a # pylint: disable=C0116 before the first and a # pylint: enable=C0116 after the last.

By applying these rules, we reduce the occurance of this message in future.

Thank you for improving NeMo's documentation!

athitten · 2025-01-25T21:41:05Z

nemo/collections/llm/api.py

-        nemo_checkpoint = Path(nemo_checkpoint)
-    if not isinstance(triton_model_repository, Path):
-        triton_model_repository = Path(triton_model_repository)
-


Not required as export to trtllm expects the path to be str and doesn't need to be a Path object. Same for triton_model_repository path.

github-actions · 2025-01-25T23:52:56Z

[🤖]: Hi @athitten 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

Glorf · 2025-01-28T08:14:27Z

@athitten overall looks good! Did you manage to run any regression testing between the two versions? Would be great to see if the results stay the same :)

athitten · 2025-01-28T22:29:37Z

@athitten overall looks good! Did you manage to run any regression testing between the two versions? Would be great to see if the results stay the same :)

Yes please the see the accuracy results below comparing the code in NeMo with batching and running evaluations directly with lm-eval-harness on llama3-8b model

github-actions · 2025-01-28T22:55:31Z

[🤖]: Hi @athitten 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

* Add server ready check before evaluation Uses bool generation_logits_available as inputs dict does not contain it Signed-off-by: Abhishree <abhishreetm@gmail.com> * Apply isort and black reformatting Signed-off-by: athitten <athitten@users.noreply.github.com> * Add batching changes Signed-off-by: Abhishree <abhishreetm@gmail.com> * Discard 0 padding with batching and other minor edits Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add func for padding and minor edits Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove commented code and Pylint fixes Signed-off-by: Abhishree <abhishreetm@gmail.com> * Apply isort and black reformatting Signed-off-by: athitten <athitten@users.noreply.github.com> --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: athitten <athitten@users.noreply.github.com> Co-authored-by: athitten <athitten@users.noreply.github.com>

* Add server ready check before evaluation Uses bool generation_logits_available as inputs dict does not contain it Signed-off-by: Abhishree <abhishreetm@gmail.com> * Apply isort and black reformatting Signed-off-by: athitten <athitten@users.noreply.github.com> * Add batching changes Signed-off-by: Abhishree <abhishreetm@gmail.com> * Discard 0 padding with batching and other minor edits Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add func for padding and minor edits Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove commented code and Pylint fixes Signed-off-by: Abhishree <abhishreetm@gmail.com> * Apply isort and black reformatting Signed-off-by: athitten <athitten@users.noreply.github.com> --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: athitten <athitten@users.noreply.github.com> Co-authored-by: athitten <athitten@users.noreply.github.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

athitten added the Run CICD label Jan 23, 2025

athitten and others added 5 commits January 25, 2025 13:12

Add server ready check before evaluation

a1f624c

Uses bool generation_logits_available as inputs dict does not contain it Signed-off-by: Abhishree <abhishreetm@gmail.com>

Apply isort and black reformatting

7e53710

Signed-off-by: athitten <athitten@users.noreply.github.com>

Add batching changes

f499c7f

Signed-off-by: Abhishree <abhishreetm@gmail.com>

Discard 0 padding with batching and other minor edits

0f94cc6

Signed-off-by: Abhishree <abhishreetm@gmail.com>

Add func for padding and minor edits

6e76bc8

Signed-off-by: Abhishree <abhishreetm@gmail.com>

athitten force-pushed the athitten/batching_eval branch from 2f506c4 to 6e76bc8 Compare January 25, 2025 21:12

athitten marked this pull request as ready for review January 25, 2025 21:13

athitten commented Jan 25, 2025

View reviewed changes

github-advanced-security bot found potential problems Jan 25, 2025

View reviewed changes

nemo/export/tensorrt_llm.py Fixed Show fixed Hide fixed

athitten commented Jan 25, 2025

View reviewed changes

Remove commented code and Pylint fixes

6c75e46

Signed-off-by: Abhishree <abhishreetm@gmail.com>

athitten force-pushed the athitten/batching_eval branch from c9309c4 to 6c75e46 Compare January 25, 2025 21:30

Apply isort and black reformatting

14fd241

Signed-off-by: athitten <athitten@users.noreply.github.com>

athitten added Run CICD and removed Run CICD labels Jan 25, 2025

athitten commented Jan 25, 2025

View reviewed changes

athitten requested review from HuiyingLi, hemildesai and oyilmaz-nvidia January 25, 2025 21:43

oyilmaz-nvidia approved these changes Jan 27, 2025

View reviewed changes

ericharper requested a review from Glorf January 27, 2025 20:29

Glorf approved these changes Jan 28, 2025

View reviewed changes

athitten merged commit 111cb42 into main Jan 29, 2025
393 of 395 checks passed

athitten deleted the athitten/batching_eval branch January 29, 2025 00:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add batching support for evaluation#11934

Add batching support for evaluation#11934
athitten merged 7 commits intomainfrom
athitten/batching_eval

athitten commented Jan 23, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jan 23, 2025

Uh oh!

athitten Jan 25, 2025

Uh oh!

Uh oh!

athitten Jan 25, 2025 •

edited

Loading

Uh oh!

athitten Jan 25, 2025

Uh oh!

athitten Jan 25, 2025

Uh oh!

github-actions bot commented Jan 25, 2025

Uh oh!

athitten Jan 25, 2025

Uh oh!

github-actions bot commented Jan 25, 2025

Uh oh!

Glorf commented Jan 28, 2025

Uh oh!

athitten commented Jan 28, 2025

Uh oh!

github-actions bot commented Jan 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

athitten commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

github-actions bot commented Jan 23, 2025

Uh oh!

athitten Jan 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

athitten Jan 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

athitten Jan 25, 2025

Choose a reason for hiding this comment

Uh oh!

athitten Jan 25, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 25, 2025

Uh oh!

athitten Jan 25, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 25, 2025

Uh oh!

Glorf commented Jan 28, 2025

Uh oh!

athitten commented Jan 28, 2025

Uh oh!

github-actions bot commented Jan 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

athitten commented Jan 23, 2025 •

edited

Loading

athitten Jan 25, 2025 •

edited

Loading