Add workflow/docs to validate changes in MESSAGE/MACRO core

This expands on comments by @glatterf42 [here](https://github.com/iiasa/message_ix/pull/430#issuecomment-2066192033) and @gidden [here](https://github.com/iiasa/message_ix/pull/430#issuecomment-2066435920). We update the description if there are alternate proposals or to link to PRs.

## Summary

- The validity of outputs from MESSAGEix-GLOBIOM scenarios (produced with the current repo `message-ix-models` and various branches of `message_data`) depend on the particular GAMS implementation of MESSAGE (and/or MACRO) in the `message_ix` package.
- Periodically we have PRs to `message_ix` that modify the GAMS code, for instance iiasa/message_ix#430, iiasa/message_ix#726, and others. Often these are to correct known bugs.
- Per testing:
  - `message_ix` contains:
    - Self-contained tests that are narrowly targeted to certain behaviours:
      - These tests use the suite of simplified models (Dantzig, Austria, Westeros) that are contained with the MESSAGE code.
      - These tests are run automatically.
      - PRs that touch the GAMS code expand or adjust these tests.
    - There are (still) some "nightly" tests that download and run MESSAGEix-GLOBIOM scenarios, and make certain checks against their outputs. These scenarios are by now quite old.
  - `message-ix-models` has a test suite with high coverage.
  - `message_data` branch `dev` has low test coverage, and other branches even lower.
- For `message_ix` GAMS PRs, there is often a concern that the following could happen:
  - The `message_ix` test suite, including expanded/added tests for the specific changes in a PR, passes, but
  - `message-ix-models` or `message_data` tests are broken, or
  - Un-tested `message_data` or `message-ix-model` behaviour (the "particular outputs" mentioned in the first bullet) changes in a way that's not obvious.

## Things to do
There are a variety of things to do about this.

### Manual checks
This is more or less what we have done thus far.

- The `message_ix` PR author(s) or reviewer(s) say: I think this PR may have consequences "further up the stack" / "downstream", and describe what those impacts could be.
- Someone (manually) re-runs some code or (manual) workflows or steps and makes some (manual) checks to ensure there are no unexpected impacts. To be clear, this is done in an **ad hoc** way every time; there is no HOWTO for these: which branches/code to use, which command(s) to run, which checks to make.
- Comments and discussion on the PR either determine (a) there is no impact, and the PR is good to merge, or (b) there are impacts, and the PR is adjusted.

### Practices, e.g. `pip freeze`
- iiasa/message_data#546 points to another option: using `pip freeze`. This is currently documented under one particular "Known issue" in the `message_data` Install instructions, [here](https://docs.messageix.org/projects/models-internal/en/latest/install.html#error-failed-building-wheels-for-cx-oracle), but we could maybe move this to a more prominent location, like the “Reproducibility” page of the `message-ix-models` docs.
- As a general rule, we could express it like this:
- If:
  - A particular branch/workflow of `message_data`/`message-ix-models` is known to work with specific versions of other packages in the stack (`message-ix-models`, `message_ix`, `ixmp`, `genno`, `pandas`, any others) **and**
  - either:
    - There are no tests of that `message_data`/`message-ix-models` branch/workflow; **or**
    - The versions of dependencies are on branches other than `main`, not yet merged or associated with a PR; **or**
    - More recent versions of the dependencies are known *not* to work;
- Then:
  - One or more `requirements.txt` file(s) should be created for that workflow/branch, recording the version(s) of upstream packages that are known to work; **and**
  - The **meaning of “known to work”**—i.e. expected features of the model outputs—should be explicitly documented.
- This allows upstream improvements to go forward. Developers working on particular project or model variant code then have a few options:
  - Continue to use the ‘frozen’ versions recorded in the requirements.txt files they have created.
  - Check if their code works with newer versions of dependencies; update or remove the requirements.txt.
  - Add tests so that compatibility of their code and specific outputs can be automatically validated as dependencies update.

### Semi-automated checks
- For workflows with tests and/or a defined CLI entry-point, a workflow like [transport.yaml](https://github.com/iiasa/message_data/blob/651a7cb/.github/workflows/transport.yaml) can be established.
- This workflow can be used in a **semi-automated** way within the “manual checks” process described above:
  - The `message_ix` PR author(s)/reviewer(s) must still identify: This change may impact 1 or more downstream workflow(s), in *[specific ways]*.
  - They, or someone else, then follows some **documented steps** to **trigger the CI workflow**, including providing an input to force it to use `message_ix` (GAMS) code from the PR branch under review—rather than `main` or the released version.
  - The results of the workflow either **directly include** checks for validity, or there are further **documented steps** to inspect the outputs for signs of undesired impacts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add workflow/docs to validate changes in MESSAGE/MACRO core #180

Summary

Things to do

Manual checks

Practices, e.g. `pip freeze`

Semi-automated checks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add workflow/docs to validate changes in MESSAGE/MACRO core #180

Description

Summary

Things to do

Manual checks

Practices, e.g. pip freeze

Semi-automated checks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Practices, e.g. `pip freeze`