Skip to content

Add workflow/docs to validate changes in MESSAGE/MACRO core #180

@khaeru

Description

@khaeru

This expands on comments by @glatterf42 here and @gidden here. We update the description if there are alternate proposals or to link to PRs.

Summary

  • The validity of outputs from MESSAGEix-GLOBIOM scenarios (produced with the current repo message-ix-models and various branches of message_data) depend on the particular GAMS implementation of MESSAGE (and/or MACRO) in the message_ix package.
  • Periodically we have PRs to message_ix that modify the GAMS code, for instance Correct MACRO GDP reporting & update docs message_ix#430, Adjust calculation of PRICE_EMISSION message_ix#726, and others. Often these are to correct known bugs.
  • Per testing:
    • message_ix contains:
      • Self-contained tests that are narrowly targeted to certain behaviours:
        • These tests use the suite of simplified models (Dantzig, Austria, Westeros) that are contained with the MESSAGE code.
        • These tests are run automatically.
        • PRs that touch the GAMS code expand or adjust these tests.
      • There are (still) some "nightly" tests that download and run MESSAGEix-GLOBIOM scenarios, and make certain checks against their outputs. These scenarios are by now quite old.
    • message-ix-models has a test suite with high coverage.
    • message_data branch dev has low test coverage, and other branches even lower.
  • For message_ix GAMS PRs, there is often a concern that the following could happen:
    • The message_ix test suite, including expanded/added tests for the specific changes in a PR, passes, but
    • message-ix-models or message_data tests are broken, or
    • Un-tested message_data or message-ix-model behaviour (the "particular outputs" mentioned in the first bullet) changes in a way that's not obvious.

Things to do

There are a variety of things to do about this.

Manual checks

This is more or less what we have done thus far.

  • The message_ix PR author(s) or reviewer(s) say: I think this PR may have consequences "further up the stack" / "downstream", and describe what those impacts could be.
  • Someone (manually) re-runs some code or (manual) workflows or steps and makes some (manual) checks to ensure there are no unexpected impacts. To be clear, this is done in an ad hoc way every time; there is no HOWTO for these: which branches/code to use, which command(s) to run, which checks to make.
  • Comments and discussion on the PR either determine (a) there is no impact, and the PR is good to merge, or (b) there are impacts, and the PR is adjusted.

Practices, e.g. pip freeze

  • iiasa/message_data#546 points to another option: using pip freeze. This is currently documented under one particular "Known issue" in the message_data Install instructions, here, but we could maybe move this to a more prominent location, like the “Reproducibility” page of the message-ix-models docs.
  • As a general rule, we could express it like this:
  • If:
    • A particular branch/workflow of message_data/message-ix-models is known to work with specific versions of other packages in the stack (message-ix-models, message_ix, ixmp, genno, pandas, any others) and
    • either:
      • There are no tests of that message_data/message-ix-models branch/workflow; or
      • The versions of dependencies are on branches other than main, not yet merged or associated with a PR; or
      • More recent versions of the dependencies are known not to work;
  • Then:
    • One or more requirements.txt file(s) should be created for that workflow/branch, recording the version(s) of upstream packages that are known to work; and
    • The meaning of “known to work”—i.e. expected features of the model outputs—should be explicitly documented.
  • This allows upstream improvements to go forward. Developers working on particular project or model variant code then have a few options:
    • Continue to use the ‘frozen’ versions recorded in the requirements.txt files they have created.
    • Check if their code works with newer versions of dependencies; update or remove the requirements.txt.
    • Add tests so that compatibility of their code and specific outputs can be automatically validated as dependencies update.

Semi-automated checks

  • For workflows with tests and/or a defined CLI entry-point, a workflow like transport.yaml can be established.
  • This workflow can be used in a semi-automated way within the “manual checks” process described above:
    • The message_ix PR author(s)/reviewer(s) must still identify: This change may impact 1 or more downstream workflow(s), in [specific ways].
    • They, or someone else, then follows some documented steps to trigger the CI workflow, including providing an input to force it to use message_ix (GAMS) code from the PR branch under review—rather than main or the released version.
    • The results of the workflow either directly include checks for validity, or there are further documented steps to inspect the outputs for signs of undesired impacts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions