-
Notifications
You must be signed in to change notification settings - Fork 2
Pipe through evalspecs w/ git revision, packages, and solver/model/task args for leaderboard #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
691f426
to
5444efc
Compare
5444efc
to
33de052
Compare
Do you know if lb view still works if you mix files with and without these fields in the same dataset? Do we need to keep results from before and after the change separated? |
@regan-huff My understanding is that if we update the schema with only new fields as in this PR, then we don't need to keep files with/out those fields separate, and the ones with missing fields should get None. We could test by updating the schema to include these new fields and see if nothing breaks. |
ok, I did a test and what I observed is that I can mix the data, but I had to delete the existing readme to get the schema to update with
|
987fe75
to
9d21b7b
Compare
9d21b7b
to
b74f079
Compare
I pushed a fix for that error (should have tested, thanks).
This PR seeks to serialize the |
View at: |
Prepare for https://github.com/allenai/astabench-issues/issues/277 and similar use
The
agenteval.json
file gets a new field that looks like the following (haven't tested publishing / updating the hf schema):I would be open to pruning back the piped solver/model and their args, but figured those might be useful as part of a pointer to the agent source / run command. (For now, in the leaderboard viewer I have only explicitly constructed a source url pointing to the repo at the git revision, which doesn't include that information).
UPDATE: I also piped through
task_args
andpackages
to get better visibility into whether submissions have been run in a standardized/correct way.