Closed
Description
Context
The required structure for benchmark predictions can vary based on the number of test sets and tasks in a given benchmark. It is necessary that when users supply their predictions for evaluation (via the evaluate
method in the BenchmarkSpecification
class), the structure of the predictions must be compatible with the structure of the target labels (primarily defined in the BenchmarkSpecification
class by the target_cols
and split
attributes).
Description
We should add additional validation or checks such that when users attempt to evaluate their results, we can confirm they are of the correct structure required by the benchmark. We should:
- Create an intermediate
BenchmarkPredictions
class that users supply their predictions to prior to evaluation. It can contain validators which compare the expected structure of predictions relative to the supplied ones. If they do not match, it can return a helpful error message to the user explaining why. This route would require a small lift in updating documentation where we explain that users must use this new class as input to prediction evaluation methods.
Acceptance Criteria
- Prior to evaluation, the supplied structure of predictions is validated against the expected structure as defined by the
BenchmarkSpecification
object. - There is helpful logging which guides users to the correct structure in the event their prediction structure is not correct.