-
-
Notifications
You must be signed in to change notification settings - Fork 6
Machine-readable schema & validator for xarray.Dataset
#211
Comments
Hmm, I'm no longer confidence that But it looks like Pydantic might be able to. See this Pydantic issue and PR. Also, take a look at how SQLModel combines Pydantic and SQLAlchemy (thans to Benoît Bovy for suggesting this on twitter!) |
Let me flesh out what I hope is possible (but I haven't tested yet!) This is adapted from pydantic/pydantic#667, and inspired by @peterdudfield's work in PR #195. (Also see Pydantic's docs on custom data types) class PydanticXArrayDataset(xr.Dataset):
"""Abstract base class for validating xr.Dataset objects."""
# From https://github.com/samuelcolvin/pydantic/issues/667
@classmethod
def __get_validators__(cls):
yield cls.validate
@classmethod
def validate(cls, v: Any) -> str:
"""Validate data. Must be overridden by child classes."""
raise NotImplementedError()
class Satellite(PydanticXArrayDataset):
@classmethod
def validate(cls, v: Any) -> str:
# validate Satellite data...
return v The above code is all that's required (I think) when pre-preparing on-disk batches, because, after #202 is implemented, the individual modalities wouldn't be squished together into a single batch object: Instead each modality would pass through nowcasting_dataset independently, and be written to disk independently. When we load the batches of each modality from disk, then we could squish them into a Pydantic model like the code below, but I'd be a little worried about hurting performance, especially when all the pre-prepared batches should have been validated when they were created! class Example(pydantic.BaseModel):
satellite: Satellite |
OK, here's a functional, but very rough example of using xarray with pydantic. This code validates a few things. But isn't ideal as a human-readable specification of the structure of |
To quote @cosmicBboy from this comment:
|
I think this is implemented in PR #229 (thanks @peterdudfield!) |
Detailed Description
If we can find an off-the-shelf schema & validator for
xarray.Dataset
then we can, hopefully, combine the best ofpydantic.BaseModel
andxarray.Dataset
. The ultimate aims are:nowcasting_datatset
. This can be used for:Context
@peterdudfield has done excellent work in pull request #195 using Pydantic to define schemas for our data. Inspired by, and building on @peterdudfield's great work, it's possible that we can get the same advantages by using something like
pandera
, but with less effort on our part :) (I'm lazy!)This is also related to #209
Related
I'll look into this this morning :)
The text was updated successfully, but these errors were encountered: