Skip to content

Releases: NVIDIA-NeMo/Evaluator

NVIDIA NeMo Evaluator 0.1.77

05 Feb 01:37

Choose a tag to compare

nemo-evaluator-v0.1.77

Merge branch 'deploy-release/bef4b952-c0f3-40fa-b5fd-320d86b86e37'

NVIDIA NeMo Evaluator Launcher 0.1.78

05 Feb 01:37

Choose a tag to compare

nemo-evaluator-launcher-v0.1.78

Merge branch 'deploy-release/bef4b952-c0f3-40fa-b5fd-320d86b86e37'

NVIDIA NeMo Evaluator 0.1.76

04 Feb 01:37
b4261b2

Choose a tag to compare

nemo-evaluator-v0.1.76

feat(slurm): add launcher_install_cmd option for custom auto-export i…

NVIDIA NeMo Evaluator Launcher 0.1.77

04 Feb 01:37
b4261b2

Choose a tag to compare

nemo-evaluator-launcher-v0.1.77

feat(slurm): add launcher_install_cmd option for custom auto-export i…

NVIDIA NeMo Evaluator 0.1.75

03 Feb 01:37
089cc9f

Choose a tag to compare

chore: Fix max_walltime docs (#685)

Signed-off-by: Wojciech Prazuch <[email protected]>

NVIDIA NeMo Evaluator Launcher 0.1.76

03 Feb 01:38
089cc9f

Choose a tag to compare

chore: Fix max_walltime docs (#685)

Signed-off-by: Wojciech Prazuch <[email protected]>

NVIDIA NeMo Evaluator 0.1.74

02 Feb 01:38
406923b

Choose a tag to compare

nemo-evaluator-v0.1.74

ci: Fix integration test by avoid writing to read-only test directory…

NVIDIA NeMo Evaluator Launcher 0.1.75

02 Feb 01:38
406923b

Choose a tag to compare

nemo-evaluator-launcher-v0.1.75

ci: Fix integration test by avoid writing to read-only test directory…

NVIDIA NeMo Evaluator 0.1.73

29 Jan 01:36
6a9803a

Choose a tag to compare

fix(slurm): node_array undefined (#671)

## Summary

When running the launcher on Slurm with `deployment.type: none`, the
generated sbatch script could fail at runtime with:

- `line N: nodes_array[0]: unbound variable`

This was triggered by `set -u` (nounset) and an unconditional
`--nodelist ${nodes_array[0]}` in the evaluation client `srun`.

## Impact

- **Configs affected**: any Slurm run with `deployment.type=none` (e.g.,
“target-only” evaluation).
- **Failure mode**: sbatch script exits before launching the evaluation
client.
- **Where observed**: Slurm job log (`slurm_script` / `slurm-%A.log`).



## Direct cause

- The sbatch script enables:
  - `set -u` (treat unset variables as an error)
- The evaluation client `srun` was emitted as:
  - `srun ... --nodelist ${nodes_array[0]} ...`
- `nodes_array` was only defined inside the deployment block (`if
cfg.deployment.type != "none": ...`).
- Therefore, for `deployment.type=none`, `nodes_array` was undefined and
`${nodes_array[0]}` crashed under nounset.

## Secondary risks (also addressed)

Even when deployment is enabled, `${nodes_array[0]}` can still fail if:

- `$SLURM_JOB_NODELIST` is unset/empty (non-standard environment) or
only `$SLURM_NODELIST` is present.
- `scontrol` is unavailable on the node or not in `PATH`.
- `scontrol show hostnames ...` returns an empty list.

Any of these can result in an empty/unset array index under `set -u`.

## Solution

### Approach

Introduce a **single, always-defined** “node pinning” variable for
single-node sruns:

- `PRIMARY_NODE`

This is resolved at runtime in the sbatch script with safe fallbacks:

1. Prefer `SLURM_JOB_NODELIST`
2. Fallback to `SLURM_NODELIST`
3. Fallback to local `hostname`

---------

Signed-off-by: Alex Gronskiy <[email protected]>

NVIDIA NeMo Evaluator 0.1.72

28 Jan 12:31
193483d

Choose a tag to compare

fix: restore support for running tasks not listed in FDF (#667)

We have improved our validation in the spirit of failing early. However,
this lead to unwanted side effect - we've lost support for running tasks
not listed in FDF with `harness.task` syntax. Calling evaluation with
this syntax was resulting in
`nemo_evaluator.core.utils.MisconfigurationError: Unknown evaluation
xxx`

It stopped working because:
* we run validation (everything passes here)
* then we prepare the config, extracting `task` from `harness.task` and
using in as evaluation `type`
* we run 2nd validation and it fails because we no longer use
`harness.task` syntax and there's no evaluation called
`task` in FDF

This PR uses `harness.task` as `type` to make sure it's always valid +
adds test verifying custom task support. It also removes one redundant
validation

---------

Signed-off-by: Marta Stepniewska-Dziubinska <[email protected]>