Skip to content

Add validators for the structure of predictions supplied to prediction evaluation methods #169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Andrewq11 opened this issue Aug 7, 2024 · 1 comment · Fixed by #187
Assignees
Labels
enhancement New feature or request

Comments

@Andrewq11
Copy link
Contributor

Andrewq11 commented Aug 7, 2024

Context

The required structure for benchmark predictions can vary based on the number of test sets and tasks in a given benchmark. It is necessary that when users supply their predictions for evaluation (via the evaluate method in the BenchmarkSpecification class), the structure of the predictions must be compatible with the structure of the target labels (primarily defined in the BenchmarkSpecification class by the target_cols and split attributes).

Description

We should add additional validation or checks such that when users attempt to evaluate their results, we can confirm they are of the correct structure required by the benchmark. We should:

  • Create an intermediate BenchmarkPredictions class that users supply their predictions to prior to evaluation. It can contain validators which compare the expected structure of predictions relative to the supplied ones. If they do not match, it can return a helpful error message to the user explaining why. This route would require a small lift in updating documentation where we explain that users must use this new class as input to prediction evaluation methods.

Acceptance Criteria

  • Prior to evaluation, the supplied structure of predictions is validated against the expected structure as defined by the BenchmarkSpecification object.
  • There is helpful logging which guides users to the correct structure in the event their prediction structure is not correct.
@Andrewq11 Andrewq11 added the enhancement New feature or request label Aug 7, 2024
@cwognum
Copy link
Collaborator

cwognum commented Aug 13, 2024

With the introduction of an intermediate BenchmarkPredictions class, we can not only enforce the user the specify a specific type, but can also standardize the type. That seems most promising to me!

See for example #121 (comment)

@kirahowe kirahowe self-assigned this Aug 20, 2024
cwognum added a commit that referenced this issue Nov 19, 2024
* add base benchmark predictions class, move tests

* wip, validations working

* trying to make types work with arbitrary incoming values

* fix equality checking issue in test. closes #169.

* add serializer to predictions

* remove separate competition predictions

* update evaluation usage to work with benchmark predictions instance

* update test set generation for evaluation

* run ruff autoformatting

* Update polaris/utils/types.py

Nicer union syntax

Co-authored-by: Cas Wognum <[email protected]>

* Update polaris/utils/types.py

Co-authored-by: Cas Wognum <[email protected]>

* wip

* add small docstring, allow string predictions

* safely get predictions in evaluation if available

* pass test set names to predictions and check for validity

* simplify safe_mask

* fix bad docstring path

* Reintroduce the CompetitionPredictions class because it includes additional metadata

* Add back the CompetitonPredictions to the docs

* Reordered docs

* Improved documentation and changed logic to disallow some edge cases

* Fixed docs

* Remove print statement

* Reorganize code

* Simplified evaluation logic

* Address all PR feedback

* Add extra test case

* Addressed PR feedback

* Fix type hint and fix model validator definition

* Fix import

---------

Co-authored-by: Kira McLean <[email protected]>
Co-authored-by: Cas Wognum <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants