Skip to content

Bug in multiplying datasets? Removes coordinates for lats #877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jormelton opened this issue Jun 8, 2016 · 4 comments
Closed

Bug in multiplying datasets? Removes coordinates for lats #877

jormelton opened this issue Jun 8, 2016 · 4 comments

Comments

@jormelton
Copy link

jormelton commented Jun 8, 2016

I wasn't sure if this should go on the googlegroup or here, but the xarray docs says bugs go here so here it is.

I am doing a very simple operation. I have two netcdf files that I load in and then multiply. First:

import xarray as xr
ncscd='NCSCDv2_Circumpolar_WGS84_SOCC30_05deg.nc'
ncscddata30 = xr.open_dataset(ncscd)
ncscd_d30cm = ncscddata30['NCSCDv2'].load()
print ncscd_d30cm


> xarray.DataArray 'NCSCDv2' (lat: 111, lon: 720)
array([[ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       ..., 
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan]])
Coordinates:
  * lon      (lon) float64 -179.7 -179.2 -178.7 -178.2 -177.7 -177.2 -176.7 ...
  * lat      (lat) float64 89.75 89.25 88.75 88.25 87.75 87.25 86.75 86.25 ...
Attributes:
    long_name: NCSCDv2
    esri_pe_string: GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]]
    units: Degree

And then the second:

gridfile ='NCSCDv2_gridarea.nc'
gridarea = xr.open_dataset(gridfile).load()
print gridarea

> xarray.Dataset
Dimensions:    (lat: 111, lon: 720)
Coordinates:
  * lon        (lon) float64 -179.7 -179.2 -178.7 -178.2 -177.7 -177.2 ...
  * lat        (lat) float64 89.75 89.25 88.75 88.25 87.75 87.25 86.75 86.25 ...
Data variables:
    cell_area  (lat, lon) float64 1.337e+07 1.337e+07 1.337e+07 1.337e+07 ...
Attributes:
    CDI: Climate Data Interface version 1.6.8 (http://mpimet.mpg.de/cdi)
    Conventions: CF-1.4
    history: Mon May 16 18:13:43 2016: cdo gridarea NCSCDv2_Circumpolar_WGS84_SOCC100_05deg.nc NCSCDv2_gridarea.nc
    CDO: Climate Data Operators version 1.6.8rc2 (http://mpimet.mpg.de/cdo)

Both arrays are the same size. Now I simply multiply them together but note that lats in the output are now 0.

ncscd_total30cm = gridarea.cell_area * ncscd_d30cm
print ncscd_total30cm 

> xarray.Dataset
Dimensions:    (lat: 0, lon: 720)
Coordinates:
  * lat        (lat) float64 
  * lon        (lon) float64 -179.7 -179.2 -178.7 -178.2 -177.7 -177.2 ...
Data variables:
    cell_area  (lat, lon) float64 
Attributes:
    CDI: Climate Data Interface version 1.6.8 (http://mpimet.mpg.de/cdi)
    Conventions: CF-1.4
    history: Mon May 16 18:13:43 2016: cdo gridarea NCSCDv2_Circumpolar_WGS84_SOCC100_05deg.nc NCSCDv2_gridarea.nc
    CDO: Climate Data Operators version 1.6.8rc2 (http://mpimet.mpg.de/cdo)

This doesn't seem like it should be happening. Removing the .load() doesn't help. Bug?

@jhamman
Copy link
Member

jhamman commented Jun 8, 2016

Something looks wrong in your example (I updated you markdown syntax btw).

ncscd_total30cm = gridarea.cell_area * ncscd_d30cm

should return a DataArray, not a Dataset as you have shown. Can you check to make sure you are multiplying two DataArrays?

@jormelton
Copy link
Author

Sorry, I tried a few different ways so that output must have been from an earlier one. I just reconfirmed, If make them all DataArray, I have the same problem.

@shoyer
Copy link
Member

shoyer commented Jun 8, 2016

There's likely a slight difference in the latitude values due to numerical precision and rounding.

Try using reindex or reindex_like with method='nearest':

gridarea = gridarea.reindex_like(ncscd_d30cm, method='nearest', tolerance=0.01)

This is an tricky thing to fix automatically because we need some heuristic to pick a default tolerance. See pandas-dev/pandas#9817 for related discussion on the pandas side.

@jormelton
Copy link
Author

Yep, that fixed it. Perhaps it could raise a flag if it removes a coordinate as a result of small mismatches in the coordinates? The mismatch was very small (and surprising as the grid area file was actually made from the other file by using cdo cdo gridarea infile outfile). Thanks @jhamman and @shoyer for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants