Conversation
|
[🤖]: Hi @athitten 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully So it might be time to merge this PR or get some approvals I'm just a bot so I'll leave it you what to do next. //cc @pablo-garay @ko3n1g |
Uses bool generation_logits_available as inputs dict does not contain it Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: athitten <athitten@users.noreply.github.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
2f506c4 to
6e76bc8
Compare
| output_generation_logits = True | ||
| else: | ||
| # In case of multiple token prediction return the full context logits | ||
| output_context_logits = True |
There was a problem hiding this comment.
output_generation_logits and output_context_logits are false by default for generate_until type task as it needs only text and not logits. Modify them only if we are returning logits.
| if re.match(mmlu_regex_pattern, requests[0].task_name): | ||
| # in case of mmlu the output token is one of 'a','b','c','d' | ||
| single_prediction_token = True | ||
|
|
There was a problem hiding this comment.
Categorize only mmlu as single token prediction task since in case of lambada, although its a single token prediction task, the tokenized output can be multi tokens depending on how the tokenizer splits it. MMLU is multiple choice and output is always one of 'a','b','c','d'.
| self.max_tokens_to_generate = 1 | ||
| # Delete the last token from continuation before passing it to the ip prompt by replacing with empty string | ||
| prompt = context + continuation.replace(self.tokenizer.tokenizer.decode(continuation_enc[-1]), "") | ||
| # Create payload to query the model deployed on PyTriton server |
There was a problem hiding this comment.
batching logic below.
| def _generate_tokens_logits( | ||
| self, payload, single_prediction_token, return_text: bool = False, return_logits: bool = False | ||
| ): | ||
| def _generate_tokens_logits(self, payload, single_prediction_token: bool = False, return_logits: bool = False): |
There was a problem hiding this comment.
Remove return_text arg, as its redundant and can be handled with just return_logits (False by default) and made True for loglikelihood tasks and remains false for generate_until type task.
Signed-off-by: Abhishree <abhishreetm@gmail.com>
c9309c4 to
6c75e46
Compare
Signed-off-by: athitten <athitten@users.noreply.github.com>
|
beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base. Your code was analyzed with PyLint. The following annotations have been identified: Mitigation guide:
By applying these rules, we reduce the occurance of this message in future. Thank you for improving NeMo's documentation! |
| nemo_checkpoint = Path(nemo_checkpoint) | ||
| if not isinstance(triton_model_repository, Path): | ||
| triton_model_repository = Path(triton_model_repository) | ||
|
|
There was a problem hiding this comment.
Not required as export to trtllm expects the path to be str and doesn't need to be a Path object. Same for triton_model_repository path.
|
[🤖]: Hi @athitten 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully So it might be time to merge this PR or get some approvals I'm just a bot so I'll leave it you what to do next. //cc @pablo-garay @ko3n1g |
|
@athitten overall looks good! Did you manage to run any regression testing between the two versions? Would be great to see if the results stay the same :) |
Yes please the see the accuracy results below comparing the code in NeMo with batching and running evaluations directly with lm-eval-harness on llama3-8b model |
|
[🤖]: Hi @athitten 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully So it might be time to merge this PR or get some approvals I'm just a bot so I'll leave it you what to do next. //cc @pablo-garay @ko3n1g |
* Add server ready check before evaluation Uses bool generation_logits_available as inputs dict does not contain it Signed-off-by: Abhishree <abhishreetm@gmail.com> * Apply isort and black reformatting Signed-off-by: athitten <athitten@users.noreply.github.com> * Add batching changes Signed-off-by: Abhishree <abhishreetm@gmail.com> * Discard 0 padding with batching and other minor edits Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add func for padding and minor edits Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove commented code and Pylint fixes Signed-off-by: Abhishree <abhishreetm@gmail.com> * Apply isort and black reformatting Signed-off-by: athitten <athitten@users.noreply.github.com> --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: athitten <athitten@users.noreply.github.com> Co-authored-by: athitten <athitten@users.noreply.github.com>
* Add server ready check before evaluation Uses bool generation_logits_available as inputs dict does not contain it Signed-off-by: Abhishree <abhishreetm@gmail.com> * Apply isort and black reformatting Signed-off-by: athitten <athitten@users.noreply.github.com> * Add batching changes Signed-off-by: Abhishree <abhishreetm@gmail.com> * Discard 0 padding with batching and other minor edits Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add func for padding and minor edits Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove commented code and Pylint fixes Signed-off-by: Abhishree <abhishreetm@gmail.com> * Apply isort and black reformatting Signed-off-by: athitten <athitten@users.noreply.github.com> --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: athitten <athitten@users.noreply.github.com> Co-authored-by: athitten <athitten@users.noreply.github.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

What does this PR do ?
loglikelihoodmethod ofnemo/collections/llm/evaluation/base.pyand query's the model with batched inputs. Introducesbatch_sizearg in the evaluate (innemo/collections/llm/api.py) method for batching. Also reintroducesmax_tokens_to_generateas its required forgenerate_untiltype tasks (ex: gsm8k)nemo/export/tensorrt_llm.pyfile, adds a private method_pad_logitsthat performs padding for context logits when they are available, since in case of batched inputs the returned context_logits can have varying seq_len and we need padding to convert to numpy array.Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use thisGitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information