Skip to content

[Feature] Support pass@1 evaluation for multi predictions in MathEvaluator #2252

@DELEnomore

Description

@DELEnomore

Describe the feature

When using a Hugging Face model with the parameter num_return_sequences set greater than 1, the output column “predictions” becomes a list instead of a string. As a result, the MathEvaluator always returns an accuracy of 0, regardless of whether the prediction is correct. It would be beneficial if the score function could handle list-type inputs and evaluate pass@1 using multiple predictions, similar to the approach mentioned in the DeepSeek-R1 technical report.

Will you implement it?

  • I would like to implement this feature and create a PR!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions