Skip to content

TypeError: Expected label or tuple of labels since switching to 0.19.0 #5651

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
georgbuechner opened this issue Jul 30, 2021 · 8 comments
Closed

Comments

@georgbuechner
Copy link

georgbuechner commented Jul 30, 2021

What happened:
Since upgrading to xarray==0.19.0 we are experiencing the following error:
TypeError: Expected label or tuple of labels, got (56.580000000002094, 0.38000000000000383)) (output of MCVE using high_resolution.grib2.bz2)

What you expected to happen:
Using xarray==0.18.2 getting values from grib-files works as expected:
[[[186.25311279296875], [185.88555908203125], [185.44927978515625], [184.66273498535156]]] (output of MCVE using high_resolution.grib2.bz2)

Minimal Complete Verifiable Example:

import bz2      
import tempfile      
import xarray      
from typing import List, Tuple      
                                                                                                    
def extract_grib_values(grib_file, locations: List[Tuple[float, float]]) -> List[List[float]]:       
    # Decompress and write to temp-file, as xarray can only work with file objects.      
    decompressed = bz2.decompress(grib_file)      
    tmp_file = tempfile.NamedTemporaryFile()                                           
    tmp_file.write(decompressed)                                                       
                                                                                       
    # Create backend_kwargs needed to correctly parse the grib-file and load dataset.    
    backend_kwargs = {'indexpath': ''}                                                  
    data_set = xarray.open_dataset(tmp_file.name, engine='cfgrib', backend_kwargs=backend_kwargs)    
                                                                                                     
    # Convert to data-frame, find value keys and extract values using coordinates and value_keys.                        
    data_frame = data_set.to_dataframe()                                                 
    value_keys = list(data_set.keys())                                                   
    values = [data_frame.loc[lat_long, value_keys].values.tolist() for lat_long in locations]    
    return values                                                                                
                                                                                                 
                                                                                                 
if __name__ == '__main__':                                                                         
    with open('high_resolution.grib2.bz2', 'rb') as data:                                         
        values = extract_grib_values(data.read(), [(56.580000000002094, 0.38000000000000383)])    
        print(values)                                                                             

Anything else we need to know?:
Use wget -O high_resolution.grib2.bz2 https://opendata.dwd.de/weather/nwp/icon-d2/grib/09/aswdifd_s/icon-d2_germany_regular-lat-lon_single-level_2021073009_019_2d_aswdifd_s.grib2.bz2
or wget -O low_resolution.grib2.bz2 https://opendata.dwd.de/weather/nwp/icon-d2/grib/09/ps/icon-d2_germany_regular-lat-lon_single-level_2021073009_030_2d_ps.grib2.bz2 to get two example files for testing the above code.

The low-resolution files work with the older and the newer version, only when using the high resolution file(s), we are experiencing the described error.
Environment:

Output of xr.show_versions() commit: None python: 3.9.6 (default, Jun 29 2021, 06:20:32) [Clang 12.0.0 (clang-1200.0.32.29)] python-bits: 64 OS: Darwin OS-release: 19.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: None libnetcdf: None

xarray: 0.18.2
pandas: 1.3.1
numpy: 1.21.1
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.9.0
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 57.0.0
pip: 21.2.1
conda: None
pytest: 6.2.4
IPython: None
sphinx: None

@max-sixty
Copy link
Collaborator

Please could you post the stack trace @georgbuechner ?

@dcherian
Copy link
Contributor

cc @alexamici @aurghs

@georgbuechner
Copy link
Author

Traceback (most recent call last):
  File "/Users/Jan.VanDick/Documents/programnming/mlservices/weather-data-extractor/.venv/lib/python3.9/site-packages/pandas/core/generic.py", line 3767, in xs
    loc, new_index = index._get_loc_level(
  File "/Users/Jan.VanDick/Documents/programnming/mlservices/weather-data-extractor/.venv/lib/python3.9/site-packages/pandas/core/indexes/multi.py", line 3084, in _get_loc_level
    return partial_selection(key)
  File "/Users/Jan.VanDick/Documents/programnming/mlservices/weather-data-extractor/.venv/lib/python3.9/site-packages/pandas/core/indexes/multi.py", line 3071, in partial_selection
    indexer = self.get_loc(key)
  File "/Users/Jan.VanDick/Documents/programnming/mlservices/weather-data-extractor/.venv/lib/python3.9/site-packages/pandas/core/indexes/multi.py", line 2941, in get_loc
    self.slice_locs(lead_key, lead_key) if lead_key else (0, len(self))
  File "/Users/Jan.VanDick/Documents/programnming/mlservices/weather-data-extractor/.venv/lib/python3.9/site-packages/pandas/core/indexes/multi.py", line 2793, in slice_locs
    return super().slice_locs(start, end, step)
  File "/Users/Jan.VanDick/Documents/programnming/mlservices/weather-data-extractor/.venv/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 5888, in slice_locs
    start_slice = self.get_slice_bound(start, "left")
  File "/Users/Jan.VanDick/Documents/programnming/mlservices/weather-data-extractor/.venv/lib/python3.9/site-packages/pandas/core/indexes/multi.py", line 2737, in get_slice_bound
    return self._partial_tup_index(label, side=side)
  File "/Users/Jan.VanDick/Documents/programnming/mlservices/weather-data-extractor/.venv/lib/python3.9/site-packages/pandas/core/indexes/multi.py", line 2810, in _partial_tup_index
    raise TypeError(f"Level type mismatch: {lab}")
TypeError: Level type mismatch: 56.580000000002094

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/Jan.VanDick/Documents/programnming/mlservices/weather-data-extractor/example.py", line 25, in <module>
    values = extract_grib_values(data.read(), [(56.580000000002094, 0.38000000000000383)])
  File "/Users/Jan.VanDick/Documents/programnming/mlservices/weather-data-extractor/example.py", line 19, in extract_grib_values
    values = [data_frame.loc[lat_long, value_keys].values.tolist() for lat_long in locations]
  File "/Users/Jan.VanDick/Documents/programnming/mlservices/weather-data-extractor/example.py", line 19, in <listcomp>
    values = [data_frame.loc[lat_long, value_keys].values.tolist() for lat_long in locations]
  File "/Users/Jan.VanDick/Documents/programnming/mlservices/weather-data-extractor/.venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 925, in __getitem__
    return self._getitem_tuple(key)
  File "/Users/Jan.VanDick/Documents/programnming/mlservices/weather-data-extractor/.venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 1100, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "/Users/Jan.VanDick/Documents/programnming/mlservices/weather-data-extractor/.venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 822, in _getitem_lowerdim
    return self._getitem_nested_tuple(tup)
  File "/Users/Jan.VanDick/Documents/programnming/mlservices/weather-data-extractor/.venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 906, in _getitem_nested_tuple
    obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
  File "/Users/Jan.VanDick/Documents/programnming/mlservices/weather-data-extractor/.venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 1164, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "/Users/Jan.VanDick/Documents/programnming/mlservices/weather-data-extractor/.venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 1113, in _get_label
    return self.obj.xs(label, axis=axis)
  File "/Users/Jan.VanDick/Documents/programnming/mlservices/weather-data-extractor/.venv/lib/python3.9/site-packages/pandas/core/generic.py", line 3771, in xs
    raise TypeError(f"Expected label or tuple of labels, got {key}") from e
TypeError: Expected label or tuple of labels, got (56.580000000002094, 0.38000000000000383)

@max-sixty
Copy link
Collaborator

Thanks @georgbuechner . It looks like that error is coming from the pandas call — is that correct? It's possible the xarray output changed between versions; understanding what changed would help here.

One suggestion for these sorts of issues — reducing the error down to a minimal example — and only including external data if unavoidable — makes diagnosing it an order of magnitude easier.

@georgbuechner
Copy link
Author

georgbuechner commented Aug 2, 2021

Here is the xarray output for the different version and resolution levels (0.18.2, 0.19.0, high, low).

# 0.18.2, high resolution
## dataframe
                                                  time  surface          valid_time  ASWDIFD_S
latitude longitude step
43.18    -3.94     0 days 19:00:00 2021-07-30 09:00:00      0.0 2021-07-31 04:00:00        NaN
                   0 days 19:15:00 2021-07-30 09:00:00      0.0 2021-07-31 04:15:00        NaN
                   0 days 19:30:00 2021-07-30 09:00:00      0.0 2021-07-31 04:30:00        NaN
                   0 days 19:45:00 2021-07-30 09:00:00      0.0 2021-07-31 04:45:00        NaN
         -3.92     0 days 19:00:00 2021-07-30 09:00:00      0.0 2021-07-31 04:00:00        NaN
...                                                ...      ...                 ...        ...
58.08     20.32    0 days 19:45:00 2021-07-30 09:00:00      0.0 2021-07-31 04:45:00        NaN
          20.34    0 days 19:00:00 2021-07-30 09:00:00      0.0 2021-07-31 04:00:00        NaN
                   0 days 19:15:00 2021-07-30 09:00:00      0.0 2021-07-31 04:15:00        NaN
                   0 days 19:30:00 2021-07-30 09:00:00      0.0 2021-07-31 04:30:00        NaN
                   0 days 19:45:00 2021-07-30 09:00:00      0.0 2021-07-31 04:45:00        NaN

[3625560 rows x 4 columns]

## value-keys
['ASWDIFD_S']
## For each coordinate
(56.580000000002094, 0.38000000000000383) [[111.47547149658203], [110.036865234375], [108.64585876464844], [107.34149932861328]]
## result
[[[111.47547149658203], [110.036865234375], [108.64585876464844], [107.34149932861328]]]

# 0.18.2, low resolution
## dataframe
latitude longitude
43.18    -3.94     2021-07-30 09:00:00 1 days 06:00:00      0.0 2021-07-31 15:00:00 NaN
         -3.92     2021-07-30 09:00:00 1 days 06:00:00      0.0 2021-07-31 15:00:00 NaN
         -3.90     2021-07-30 09:00:00 1 days 06:00:00      0.0 2021-07-31 15:00:00 NaN
         -3.88     2021-07-30 09:00:00 1 days 06:00:00      0.0 2021-07-31 15:00:00 NaN
         -3.86     2021-07-30 09:00:00 1 days 06:00:00      0.0 2021-07-31 15:00:00 NaN
...                                ...             ...      ...                 ...  ..
58.08     20.26    2021-07-30 09:00:00 1 days 06:00:00      0.0 2021-07-31 15:00:00 NaN
          20.28    2021-07-30 09:00:00 1 days 06:00:00      0.0 2021-07-31 15:00:00 NaN
          20.30    2021-07-30 09:00:00 1 days 06:00:00      0.0 2021-07-31 15:00:00 NaN
          20.32    2021-07-30 09:00:00 1 days 06:00:00      0.0 2021-07-31 15:00:00 NaN
          20.34    2021-07-30 09:00:00 1 days 06:00:00      0.0 2021-07-31 15:00:00 NaN

[906390 rows x 5 columns]
## value-keys
['sp']
## For each coordinate
(56.580000000002094, 0.38000000000000383) [100547.734375]
## result
[[100547.734375]]

# 0.19.0, high resolution
## dataframe
                                                  time  surface          valid_time  ASWDIFD_S
step            latitude longitude
0 days 19:00:00 43.18    -3.94     2021-07-30 09:00:00      0.0 2021-07-31 04:00:00        NaN
                         -3.92     2021-07-30 09:00:00      0.0 2021-07-31 04:00:00        NaN
                         -3.90     2021-07-30 09:00:00      0.0 2021-07-31 04:00:00        NaN
                         -3.88     2021-07-30 09:00:00      0.0 2021-07-31 04:00:00        NaN
                         -3.86     2021-07-30 09:00:00      0.0 2021-07-31 04:00:00        NaN
...                                                ...      ...                 ...        ...
0 days 19:45:00 58.08     20.26    2021-07-30 09:00:00      0.0 2021-07-31 04:45:00        NaN
                          20.28    2021-07-30 09:00:00      0.0 2021-07-31 04:45:00        NaN
                          20.30    2021-07-30 09:00:00      0.0 2021-07-31 04:45:00        NaN
                          20.32    2021-07-30 09:00:00      0.0 2021-07-31 04:45:00        NaN
                          20.34    2021-07-30 09:00:00      0.0 2021-07-31 04:45:00        NaN
## for each
!! Fail !!
TypeError: Expected label or tuple of labels, got (56.580000000002094, 0.38000000000000383)

# 0.19.0, low resolution
## dataframe
latitude longitude
43.18    -3.94     2021-07-30 09:00:00 1 days 06:00:00      0.0 2021-07-31 15:00:00 NaN
         -3.92     2021-07-30 09:00:00 1 days 06:00:00      0.0 2021-07-31 15:00:00 NaN
         -3.90     2021-07-30 09:00:00 1 days 06:00:00      0.0 2021-07-31 15:00:00 NaN
         -3.88     2021-07-30 09:00:00 1 days 06:00:00      0.0 2021-07-31 15:00:00 NaN
         -3.86     2021-07-30 09:00:00 1 days 06:00:00      0.0 2021-07-31 15:00:00 NaN
...                                ...             ...      ...                 ...  ..
58.08     20.26    2021-07-30 09:00:00 1 days 06:00:00      0.0 2021-07-31 15:00:00 NaN
          20.28    2021-07-30 09:00:00 1 days 06:00:00      0.0 2021-07-31 15:00:00 NaN
          20.30    2021-07-30 09:00:00 1 days 06:00:00      0.0 2021-07-31 15:00:00 NaN
          20.32    2021-07-30 09:00:00 1 days 06:00:00      0.0 2021-07-31 15:00:00 NaN
          20.34    2021-07-30 09:00:00 1 days 06:00:00      0.0 2021-07-31 15:00:00 NaN

[906390 rows x 5 columns]
## value-keys
['sp']
## For each coordinate
(56.580000000002094, 0.38000000000000383) [100547.734375]
## result
[[100547.734375]]

As we can see for v0.19.0 and high-resolution the xarray output has changed, moving the column "step" to the front. This makes it neccessary to add 'step', when accessing the values:
values = [data_frame.loc[lat_long, value_keys].values.tolist() for lat_long in locations]
becomes
values = [data_frame.loc[(step, latitude, longitude), value_keys].values.tolist() for lat_long in locations]

However, the data-set is still the same, independant of the xarry-version, with the only change being the order of the dimensions (0.18.2: [latitude: 746, longitude: 1215, step: 4], 0.19.0: [step: 4, latitude: 746, longitude: 1215]):

Dimensions:     (latitude: 746, longitude: 1215, step: 4)
Coordinates:
    time        datetime64[ns] ...
  * step        (step) timedelta64[ns] 19:00:00 19:15:00 19:30:00 19:45:00
    surface     float64 ...
  * latitude    (latitude) float64 43.18 43.2 43.22 43.24 ... 58.04 58.06 58.08
  * longitude   (longitude) float64 -3.94 -3.92 -3.9 -3.88 ... 20.3 20.32 20.34
    valid_time  (step) datetime64[ns] ...
Data variables:
    ASWDIFD_S   (step, latitude, longitude) float32 ...

@max-sixty
Copy link
Collaborator

Thanks for tracking that down @georgbuechner . We'd need to find the source of the different ordering — is it the file or is there a problem with xarray?

@keewis
Copy link
Collaborator

keewis commented Aug 2, 2021

FYI the important line is

    ASWDIFD_S   (step, latitude, longitude) float32 ...

unlike the DataArray repr, the Dimensions overview in the Dataset repr does not contain information about the order.

That said, I checked your file (or rather, a new file from that server) and can reproduce this issue, which appears to have been introduced in #4753. Not sure if that's a bug, though: as ASWDIFD_S has a dimension order of (step, latitude, longitude) so I'd argue that the conversion was wrong before.

If you want a specific order, I'd recommend passing that to the dim_order parameter of .to_dataframe or to transpose before calling .to_dataframe.

@max-sixty
Copy link
Collaborator

Sounds like this is largely working as intended, or at least we don't have an MCVE to show that it's not, so will close. Folks should feel free to open with an MCVE if there are outstanding issues.

(I haven't been loving the defaults of to_dataframe in some recent work, so that person might be me...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants