Use evaluation logic directly in hub, no need for wrapper #109

kirahowe · 2024-05-29T02:07:57Z

Changelogs

This PR removes the wrapper around the evaluation logic for use by the hub. Instead we can just call evaluate_benchmark directly from there. This requires testing in the context of working competitions, which we will do next. This requires the accompanying PR in the hub, https://github.com/polaris-hub/polaris-hub/pull/338

Checklist:

Was this PR discussed in an issue? It is recommended to first discuss a new feature into a GitHub issue before opening a PR.
Add tests to cover the fixed bug(s) or the newly introduced feature(s) (if appropriate).
Update the API documentation if a new function is added, or an existing one is deleted.
Write concise and explanatory changelogs above.
If possible, assign one of the following labels to the PR: feature, fix or test (or ask a maintainer to do it for you).

kirahowe · 2024-05-29T02:08:29Z

tests/test_competition.py

@@ -11,17 +10,3 @@ def test_competition_from_json(test_competition, tmpdir):
    path = test_competition.to_json(str(tmpdir))
    new_competition = CompetitionSpecification.from_json(path)
    assert new_competition == test_competition
-
-def test_competition_evaluation(test_competition):


This is redundant now since the evaluation logic is the exact same for regular benchmarks.

kirahowe · 2024-05-29T03:14:44Z

polaris/evaluate/__init__.py

+           "MetricInfo",
+           "BenchmarkResults",
+           "ResultsType",
+           "evaluate_benchmark"]


Adding this evaluate_benchmark function so it can be used from the hub.

kirahowe · 2024-05-29T03:16:57Z

polaris/hub/client.py

            url=f"/v2/competition/evaluate",
            method="PUT",
            json={
                "competition": competition.artifact_id,
                "predictions": y_pred
            })
+        return BenchmarkResults(results=pd.read_json(response["scores"]),


I realized the hub can just return the scores, so we don't need to faff with constructing a valid competition object there. This still returns a BenchmarkResults, but now the hub just does the evaluation and nothing more.

cwognum

Hi @kiramclean, this change makes sense to me, but I think I lack the broader context of this PR to properly review it! 😄

I will do a thorough review once feat/competitions is ready for review!

* competition wip * wip * wip * adding methods for interfacing w/ competitions * Continuing to integrate polaris client with the Hub for comps * comp wip * updating date serializer * Competition evaluation (#103) * call hub evaluate endpoint from client evaluate_competitions method * add super basic test for evaluating competitions * be more specific in evaluate_benchmark signature * Update polaris/hub/client.py Co-authored-by: Andrew Quirke <[email protected]> * start refactoring object dependencies out of evaluation logic * refactor test subset object out of evaluation logic * clean up as much as possible for now * updating date serializer * call hub evaluate endpoint from client evaluate_competitions method * Update polaris/competition/_competition.py Co-authored-by: Andrew Quirke <[email protected]> * updating date serializer * call hub evaluate endpoint from client evaluate_competitions method * add super basic test for evaluating competitions * comp wip * updating date serializer * call hub evaluate endpoint from client evaluate_competitions method * fix bad merge resolution * only send competition artifact ID to hub --------- Co-authored-by: Andrew Quirke <[email protected]> Co-authored-by: Andrew Quirke <[email protected]> * Use evaluation logic directly in hub, no need for wrapper (#109) * use evaluation logic directly in hub, no need for wrapper * include evaluate_benchmark in package * remove unnecessary imports * read incoming scores sent as json * light formatting updates * updating fallback version for dev build * integrating results for comps (#111) * integrating results for comps * Update polaris/hub/client.py Co-authored-by: Cas Wognum <[email protected]> * addressing comments & adding CompetitionResults class * test competition evalution works for multi-column dataframes * add single column test to competition evaluation * fix multitask-single-test-set cases * fix bug with multi-test-set benchmarks * adding functions to serialize & deserialize pred objs for external eval * updating return for evaluate_competition method in client * updating evaluate_competition method to pass additional result info to hub --------- Co-authored-by: Cas Wognum <[email protected]> Co-authored-by: Kira McLean <[email protected]> * updates to enable fetching & interacting with comps * updating requirement for eval name * Feat/competition/eval (#114) * integrating results for comps * Update polaris/hub/client.py Co-authored-by: Cas Wognum <[email protected]> * addressing comments & adding CompetitionResults class * test competition evalution works for multi-column dataframes * add single column test to competition evaluation * fix multitask-single-test-set cases * fix bug with multi-test-set benchmarks * adding functions to serialize & deserialize pred objs for external eval * updating return for evaluate_competition method in client * updating evaluate_competition method to pass additional result info to hub * refuse early to upload a competition with a zarr-based dataset * removing merge conflicts --------- Co-authored-by: Andrew Quirke <[email protected]> Co-authored-by: Andrew Quirke <[email protected]> Co-authored-by: Cas Wognum <[email protected]> * test that all rows of a competition test set will have at least a value (#116) * update competition evaluation to support y_prob * run ruff on all files and fix issues * fix wrong url printout after upload * Clarifying typing for nested types * removing if_exists arg from comps * raising error for trying to make zarr comp * updating name of ArtifactType to ArtifactSubtype * updating comments & removing redundant class attributes * moving split validator logic from comp spec to benchmark spec * removing redundant checks from CompetitionDataset class * creating pydantic model for comp predictions * split validator logic, redundant pydantic checks, comp pred pydantic model * changes for comps wrap up * Adding CompetitionsPredictionsType * adding conversion validator for comp prediction type * setting predictions validator as class method * Using self instead of cls for field validators * removing model validation on fetch from hub * Creating HubOwner object in comp result eval method * Documentation & tutorials for competitions * Removing create comp method, fixing failing tests, updating benchmark label struct * Updating docs for create comp & benchmark pred structure * tiny wording change in competition tutorial * Addressing PR feedback * fixing tests & removing dataset redefinition from CompetitionDataset class * Commenting out line in tutorial to fix test * fixing formatting * small fixes & depending on tableContent for dataset storage info --------- Co-authored-by: Andrew Quirke <[email protected]> Co-authored-by: Andrew Quirke <[email protected]> Co-authored-by: Cas Wognum <[email protected]>

kirahowe added 2 commits May 28, 2024 22:44

use evaluation logic directly in hub, no need for wrapper

6bf2dec

include evaluate_benchmark in package

c022497

kirahowe added the feature Annotates any PR that adds new features; Used in the release process label May 29, 2024

kirahowe requested a review from cwognum as a code owner May 29, 2024 02:07

kirahowe commented May 29, 2024

View reviewed changes

kirahowe added 2 commits May 28, 2024 23:17

remove unnecessary imports

dc5e621

read incoming scores sent as json

9022f11

kirahowe commented May 29, 2024

View reviewed changes

cwognum reviewed May 29, 2024

View reviewed changes

Andrewq11 approved these changes May 30, 2024

View reviewed changes

Andrewq11 merged commit 54c42e2 into feat/competitions May 30, 2024

kirahowe deleted the km/simplify-evaluation branch May 31, 2024 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use evaluation logic directly in hub, no need for wrapper #109

Use evaluation logic directly in hub, no need for wrapper #109

Uh oh!

kirahowe commented May 29, 2024

Uh oh!

kirahowe May 29, 2024

Uh oh!

kirahowe May 29, 2024

Uh oh!

kirahowe May 29, 2024

Uh oh!

cwognum left a comment

Uh oh!

Uh oh!

Use evaluation logic directly in hub, no need for wrapper #109

Use evaluation logic directly in hub, no need for wrapper #109

Uh oh!

Conversation

kirahowe commented May 29, 2024

Changelogs

Uh oh!

kirahowe May 29, 2024

Choose a reason for hiding this comment

Uh oh!

kirahowe May 29, 2024

Choose a reason for hiding this comment

Uh oh!

kirahowe May 29, 2024

Choose a reason for hiding this comment

Uh oh!

cwognum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!