-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
'Parallelized' apply_ufunc for scripy.interpolate.griddata #5281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
@mathause Thank you for your response! I believe that However, if there are other (xarray) functions which I could use to interpolate my results to a regular grid - I'd be very interested. |
### UPDATE 1: ###
Now I "only" get an error concerning the size of my output (which is nolonger chunked). Is there perhaps a way to have apply_ufunc chunk your output along any of the other dimensions? ### UPDATE 2: ### Alternatively, I have tried to chunk over the time dimension (50 time steps) and I have removed all input/output core dimensions. And if I then define interp_to_grid as follows (to get the right input dimensions):
I do get the right dimensions for my
|
@LJaksic are you aware that passing |
please reopen with an MVCE if you still need help. This would be a good example for https://tutorial.xarray.dev/advanced/apply_ufunc/apply_ufunc.html |
Hi,
I'm working with large files from an ocean model with an unstructered grid. For instance, variable flow velocity
ux
with dimensions(194988, 1009, 20)
for respectively: 'nFlowElement' (name unstructered grid element), 'time' and laydim (depth dimension). I'd like to interpolate these results to a structured grid with dimensions(600, 560, 1009, 20)
for respectively: latitude, longitude, time and laydim. For this I am usingscipy.interpolate.griddata
. As these dataarrays are too large to load into your working memory at once, I am trying to work with 'chunks' (dask). Unfortunately, I bump into problems when trying to use apply_ufunc with setting:dask = 'parallelized'
.For smaller computational domains (smaller nFlowElement dimension) I ám still able to load the dataarray in my work memory. Then, the following code gives me the wanted result:
Notice that in the function interp_to_grid the input variables have the following dimensions:
u
(i.e. ux, the original flow velocity output): (194988, 1009, 20) for (nFlowElem, time, laydim)xc,yc
(the latitude and longitude coordinates associated with these 194988 elements) so both (194988,)xint, yint
(the structured grid coordinates to which I would like to interpolate the data): both are (600, 560) for (dim_0,dim_1)Notice that scipy.interpolate.griddata does not require me to loop over the time and laydim dimension (as formulated in the code above). For this it is criticial to feed
griddata
the dimensions in the right order ('time' and 'laydim' last). The interpolated result, uxg, has dimensions (600, 560, 1009, 20) - as wanted and expected.However, for much larger spatial domains it is required to work with dask = 'parallelized', because these input dataarrays can nolonger be loaded into my working memory. I have tried to apply chunks over the time dimension, but also over the nFlowElement dimension. I am aware that it is not possible to chunk over core dimensions.
This is one of my "parallel" attempts (with chunks along the time dim):
Input ux:
Apply_func:
Gives error:
I have played around a lot with changing the core dimensions in apply_ufunc and the dimension along which to chunk. Also I have tried to manually change the order of dimensions of dataarray
u
which is 'fed to' griddata (ininterp_to_grid
).Any advice is very welcome!
Best Wishes,
Luka
The text was updated successfully, but these errors were encountered: