Skip to content

When to validate variable values? #32

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
benbovy opened this issue Mar 6, 2018 · 2 comments · Fixed by #74
Closed

When to validate variable values? #32

benbovy opened this issue Mar 6, 2018 · 2 comments · Fixed by #74
Milestone

Comments

@benbovy
Copy link
Member

benbovy commented Mar 6, 2018

Validation should be done on each value given as a model input, either when creating the input xarray.Dataset or when we add the value in the key-value store (see #31). The second might be better if we eventually want to use the modelling framework with another interface than xarray.Dataset.

For performance issues, it may be wise not to systematically validate the value of a variable when it is set/updated by a process during a simulation. Validation may be needed in some cases, though (e.g., to ensure that values computed in a process does not fall outside of an acceptable range). This would require a convenient way to call the validators of a (possibly foreign) variable from within a process.

@benbovy benbovy added this to the 0.2 milestone Mar 6, 2018
@benbovy benbovy removed this from the 0.2 milestone Apr 16, 2018
@benbovy benbovy added this to the 0.3 milestone Aug 29, 2018
@benbovy benbovy modified the milestones: 0.3, 0.4 Sep 26, 2019
@benbovy
Copy link
Member Author

benbovy commented Dec 12, 2019

Ideally, the earlier input values are validated the better. However, validating input values when creating the input dataset (using creating_setup) is a bad idea, because input datasets may be created by other means (e.g., by loading a netcdf file or after some pre-processing). It would also complicate the validation itself, e.g., considering additional dimension(s) of the input variables, such as the master clock for time-varying values or a batch dimension for running batches of simulations.

So the best place, common to all cases, is to validate just before setting inputs into the simulation data store.

All kinds of validation (even for model inputs) should be optional, as it may impact performance. Control on validation might be possible by introducing a parameter to Dataset.xsimlab.run(), e.g.,:

  • Dataset.xsimlab.run(validate='nothing'): no validation is performed.
  • Dataset.xsimlab.run(validate='inputs'): validate only input values.
  • Dataset.xsimlab.run(validate='all'): validate both input values and values set by foreign variables in process classes.

The latter may be costly, but it is useful for debugging.

This would require a convenient way to call the validators of a (possibly foreign) variable from within a process.

This is not possible, as validation must be performed at the level of a process class (e.g., when validation involves checking the values of multiple variables declared in the process class).

@benbovy
Copy link
Member Author

benbovy commented Dec 12, 2019

So the best place, common to all cases, is to validate just before setting inputs into the simulation data store.

Actually, using attr.validate, validation would rather be performed just after updating the simulation store. I think it doesn't really matters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant