-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
NetCDF: Not a valid ID when trying to retrieve values from Dask array #2305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@edougherty32 - I'm not exactly sure what your problem is but I have some ideas for you:
|
Yes. My reading of your comment above made me think that each variable corresponded to a specific event/time. If that is the case, you could populate your coordinate values with the corresponding time stamps. |
Thanks, @jhamman! I attempted using open_mfdataset, but that will not work for what I need to do with my data, even with filtering filenames ahead of time. How would I go about defining a new coordinate and populating with corresponding time stamps? Would that involve using I am still worried that will not work, since I am still receiving the same error message as before that will not load data from dask arrays into numpy arrays, as shown by this test:
Which results in the following error message:
I only get this error message when opening the files in a loop with chunking (which I need to do for efficiency and methodological purposes):
Again, I'm not sure how to get around these issues in my current framework, so please let me know if you have any more suggestions! |
If you're interested in testing out development versions of xarray, there's a decent chance that this pull request will fix this issue: I would be curious to know if this works. |
@shoyer– Thanks, I'll try this out and let you know how it works. |
@shoyer–Sorry for my own ignorance, but how do I implement the xarray.backends.file_manager within the framework of my code? Do I need to download the .py files to my directory and call them in my own script? The functionality looks promising, but I admit that the use of these tools is somewhat new to me and I would appreicate any additional guidance. Thanks! |
@edougherty32 - from inside your environment (conda or virtual env), you'll want to run something like:
|
@edougherty32 - I think using @shoyer's branch (as installed using the pip command above), you should just try rerunning your failing example. The pip command above should update your version of xarray to #2261. |
Great, thanks @jhamman! |
Hi @jhamman and @shoyer–updating my version of xarray to ##2261 mostly solved the issue I mentioned above! However, I am now having a new issue when trying to access values from a dataarray,
Where each variable is accumulated precipiation over the U.S. for a particular flood case. When I access the first variable, I receive a numpy array, as expected (thus solving the issue from above).
Yet, when I try to access other variables, I receive the following error message:
This only happens for variables for which I previously utilized the Is there a work-around on this new issue? Thanks. |
Thanks for testing it out! That pull request still needs a bit of work with dask.distributed -- it's own tests are still failing. When we get that working, it will probably be ready for another test. |
@shoyer–No problem! Ok, thanks for letting me know! Do you know when that pull request would be working for another test? No rush, but I'm just curious. Thanks again for all the help! |
@edougherty32 This took a while, but I think #2261 is ready for another test now. |
Hi, I am attempting to pull values from an xarray dataset to accumulate rainfall at specific times over a large number of dimensions. The dataset, concat_floods_all, is as follows:
With 658 variables (all accumulated rainfall at different times over the same domain):
The issue is when I sum all the variables up, using the following:
sum_floods = concat_floods_all.sum(skipna = True, dim='variable').compute()
I get the following error message:
RuntimeError: NetCDF: Not a valid ID
Based on ##1001, I believe this error is due to opening numerous files I search, and then appending them to a list in a for loop (I chose this method over mfdataset, due to combining some files and deleting redundant ones).
I am wondering how to get the actual 1015x1359 array of values for sum_floods and work around this issue.
Thanks!
The text was updated successfully, but these errors were encountered: