-
Notifications
You must be signed in to change notification settings - Fork 45
Open
Description
This expands on comments by @glatterf42 here and @gidden here. We update the description if there are alternate proposals or to link to PRs.
Summary
- The validity of outputs from MESSAGEix-GLOBIOM scenarios (produced with the current repo
message-ix-models
and various branches ofmessage_data
) depend on the particular GAMS implementation of MESSAGE (and/or MACRO) in themessage_ix
package. - Periodically we have PRs to
message_ix
that modify the GAMS code, for instance Correct MACRO GDP reporting & update docs message_ix#430, Adjust calculation of PRICE_EMISSION message_ix#726, and others. Often these are to correct known bugs. - Per testing:
message_ix
contains:- Self-contained tests that are narrowly targeted to certain behaviours:
- These tests use the suite of simplified models (Dantzig, Austria, Westeros) that are contained with the MESSAGE code.
- These tests are run automatically.
- PRs that touch the GAMS code expand or adjust these tests.
- There are (still) some "nightly" tests that download and run MESSAGEix-GLOBIOM scenarios, and make certain checks against their outputs. These scenarios are by now quite old.
- Self-contained tests that are narrowly targeted to certain behaviours:
message-ix-models
has a test suite with high coverage.message_data
branchdev
has low test coverage, and other branches even lower.
- For
message_ix
GAMS PRs, there is often a concern that the following could happen:- The
message_ix
test suite, including expanded/added tests for the specific changes in a PR, passes, but message-ix-models
ormessage_data
tests are broken, or- Un-tested
message_data
ormessage-ix-model
behaviour (the "particular outputs" mentioned in the first bullet) changes in a way that's not obvious.
- The
Things to do
There are a variety of things to do about this.
Manual checks
This is more or less what we have done thus far.
- The
message_ix
PR author(s) or reviewer(s) say: I think this PR may have consequences "further up the stack" / "downstream", and describe what those impacts could be. - Someone (manually) re-runs some code or (manual) workflows or steps and makes some (manual) checks to ensure there are no unexpected impacts. To be clear, this is done in an ad hoc way every time; there is no HOWTO for these: which branches/code to use, which command(s) to run, which checks to make.
- Comments and discussion on the PR either determine (a) there is no impact, and the PR is good to merge, or (b) there are impacts, and the PR is adjusted.
Practices, e.g. pip freeze
- iiasa/message_data#546 points to another option: using
pip freeze
. This is currently documented under one particular "Known issue" in themessage_data
Install instructions, here, but we could maybe move this to a more prominent location, like the “Reproducibility” page of themessage-ix-models
docs. - As a general rule, we could express it like this:
- If:
- A particular branch/workflow of
message_data
/message-ix-models
is known to work with specific versions of other packages in the stack (message-ix-models
,message_ix
,ixmp
,genno
,pandas
, any others) and - either:
- There are no tests of that
message_data
/message-ix-models
branch/workflow; or - The versions of dependencies are on branches other than
main
, not yet merged or associated with a PR; or - More recent versions of the dependencies are known not to work;
- There are no tests of that
- A particular branch/workflow of
- Then:
- One or more
requirements.txt
file(s) should be created for that workflow/branch, recording the version(s) of upstream packages that are known to work; and - The meaning of “known to work”—i.e. expected features of the model outputs—should be explicitly documented.
- One or more
- This allows upstream improvements to go forward. Developers working on particular project or model variant code then have a few options:
- Continue to use the ‘frozen’ versions recorded in the requirements.txt files they have created.
- Check if their code works with newer versions of dependencies; update or remove the requirements.txt.
- Add tests so that compatibility of their code and specific outputs can be automatically validated as dependencies update.
Semi-automated checks
- For workflows with tests and/or a defined CLI entry-point, a workflow like transport.yaml can be established.
- This workflow can be used in a semi-automated way within the “manual checks” process described above:
- The
message_ix
PR author(s)/reviewer(s) must still identify: This change may impact 1 or more downstream workflow(s), in [specific ways]. - They, or someone else, then follows some documented steps to trigger the CI workflow, including providing an input to force it to use
message_ix
(GAMS) code from the PR branch under review—rather thanmain
or the released version. - The results of the workflow either directly include checks for validity, or there are further documented steps to inspect the outputs for signs of undesired impacts.
- The
Metadata
Metadata
Assignees
Labels
No labels