You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The required structure for benchmark predictions can vary based on the number of test sets and tasks in a given benchmark. It is necessary that when users supply their predictions for evaluation (via the evaluate method in the BenchmarkSpecification class), the structure of the predictions must be compatible with the structure of the target labels (primarily defined in the BenchmarkSpecification class by the target_cols and split attributes).
Description
We should add additional validation or checks such that when users attempt to evaluate their results, we can confirm they are of the correct structure required by the benchmark. We should:
Create an intermediate BenchmarkPredictions class that users supply their predictions to prior to evaluation. It can contain validators which compare the expected structure of predictions relative to the supplied ones. If they do not match, it can return a helpful error message to the user explaining why. This route would require a small lift in updating documentation where we explain that users must use this new class as input to prediction evaluation methods.
Acceptance Criteria
Prior to evaluation, the supplied structure of predictions is validated against the expected structure as defined by the BenchmarkSpecification object.
There is helpful logging which guides users to the correct structure in the event their prediction structure is not correct.
The text was updated successfully, but these errors were encountered:
With the introduction of an intermediate BenchmarkPredictions class, we can not only enforce the user the specify a specific type, but can also standardize the type. That seems most promising to me!
* add base benchmark predictions class, move tests
* wip, validations working
* trying to make types work with arbitrary incoming values
* fix equality checking issue in test. closes#169.
* add serializer to predictions
* remove separate competition predictions
* update evaluation usage to work with benchmark predictions instance
* update test set generation for evaluation
* run ruff autoformatting
* Update polaris/utils/types.py
Nicer union syntax
Co-authored-by: Cas Wognum <[email protected]>
* Update polaris/utils/types.py
Co-authored-by: Cas Wognum <[email protected]>
* wip
* add small docstring, allow string predictions
* safely get predictions in evaluation if available
* pass test set names to predictions and check for validity
* simplify safe_mask
* fix bad docstring path
* Reintroduce the CompetitionPredictions class because it includes additional metadata
* Add back the CompetitonPredictions to the docs
* Reordered docs
* Improved documentation and changed logic to disallow some edge cases
* Fixed docs
* Remove print statement
* Reorganize code
* Simplified evaluation logic
* Address all PR feedback
* Add extra test case
* Addressed PR feedback
* Fix type hint and fix model validator definition
* Fix import
---------
Co-authored-by: Kira McLean <[email protected]>
Co-authored-by: Cas Wognum <[email protected]>
Context
The required structure for benchmark predictions can vary based on the number of test sets and tasks in a given benchmark. It is necessary that when users supply their predictions for evaluation (via the
evaluate
method in theBenchmarkSpecification
class), the structure of the predictions must be compatible with the structure of the target labels (primarily defined in theBenchmarkSpecification
class by thetarget_cols
andsplit
attributes).Description
We should add additional validation or checks such that when users attempt to evaluate their results, we can confirm they are of the correct structure required by the benchmark. We should:
BenchmarkPredictions
class that users supply their predictions to prior to evaluation. It can contain validators which compare the expected structure of predictions relative to the supplied ones. If they do not match, it can return a helpful error message to the user explaining why. This route would require a small lift in updating documentation where we explain that users must use this new class as input to prediction evaluation methods.Acceptance Criteria
BenchmarkSpecification
object.The text was updated successfully, but these errors were encountered: