Wrong hue assignment in scatter plot #4641

astoeriko · 2020-12-02T09:34:40Z

What happened:
When using the hue keyword in a scatter plot to color the points based on a string variable, the color assignment in the plot is wrong (whereas the legend is correct).

What you expected to happen:
In the example, data of category "A" ranges between 0 and 2 in u-direction and 0 and 0.5 in v-direction. Points in that square should be orange (the color for "A") but currently are blue.

Minimal Complete Verifiable Example:

import xarray as xr
import numpy as np

u = np.random.rand(50, 2) * np.array([1, 2])
v = np.random.rand(50, 2) * np.array([1, 0.5])

ds = xr.Dataset(
    {
        "u": (("x", "category"), u),
        "v": (("x", "category"), v),
    },
    coords={"category": ["B", "A"],}
)

g = ds.plot.scatter(
    y="u",
    x="v",
    hue="category",
);

Anything else we need to know?:
I think that this might be related to sorting at some point. If the variable by which I color is sorted alphabetically (["A", "B"] instead of ["B", "A"]), the color assignment is correct.

Not sure if this issue is related to #4126, bit it looks different to me (the problem is not the legend, but the colors in the plot itself).

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:25:08)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-122-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.16.0
pandas: 1.1.2
numpy: 1.17.5
scipy: 1.5.2
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.26.0
distributed: 2.26.0
matplotlib: 3.3.2
cartopy: None
seaborn: 0.11.0
numbagg: None
pint: None
setuptools: 49.6.0.post20200814
pip: 20.2.3
conda: 4.8.3
pytest: 6.0.1
IPython: 7.18.1
sphinx: None

The text was updated successfully, but these errors were encountered:

astoeriko · 2020-12-02T10:05:21Z

After updating xarray to 0.16.2, the colors in the plot agree with the colors in the legend, so the error indicated above does not persist. We can probably close this issue.
However, this seems to be achieved not by changing the colors in the plot but by sorting the legend as well. That is, the order of the category variable in the legend is ["A", "B"], although I specified it to be ["B", "A"] in the dataset. I am not sure if this is an intended behaviour?

astoeriko · 2020-12-02T12:09:48Z

As my original plot still was wrong after updating I investigated a bit further: The problem persists when also faceting.
Here is my new example where, again, data of category "A" get colored as "B" and vice versa.

import xarray as xr
import numpy as np

u = np.random.rand(50, 2, 2) * np.array([1, 2])
v = np.random.rand(50, 2) * np.array([1, 0.5])

ds = xr.Dataset(
    {
        "u": (("x", "foo", "category"), u),
        "v": (("x", "category"), v),
    },
    coords={"category": ["B", "A"], "foo": [1, 2]}
)

g = ds.plot.scatter(
    y="u",
    x="v",
    hue="category",
    col="foo"
);

I am sorry for the confusion.

Output of `xr.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.7.8 | packaged by conda-forge | (default, Nov 27 2020, 19:24:58)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-122-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.16.2
pandas: 1.1.2
numpy: 1.17.5
scipy: 1.5.3
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.3.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.26.0
distributed: 2.30.1
matplotlib: 3.3.2
cartopy: None
seaborn: 0.11.0
numbagg: None
pint: None
setuptools: 49.6.0.post20201009
pip: 20.3
conda: 4.8.3
pytest: 6.0.1
IPython: 7.19.0
sphinx: None

shoyer · 2020-12-03T02:18:20Z

could you share an image showing what the incorrect plot(s) looks like? you should be able to "paste" into the comment field in GitHub

astoeriko · 2020-12-03T10:31:33Z

Here are the plots demonstrating what I mean.

The “upright” rectangle (in the intervals [0, 0.5] and [0, 2]) of points represents the data corresponding to category "A". However, it is colored in blue, which corresponds to category "B". The order of labels in the legend is correct in the sense that it conserves the order in the Dataset.

In the second image, the color assignment in the plot is correct – data corresponding to category "A" is still colored in blue but that now corresponds to category "A". The legend is now alphabetically ordered instead of conserving the order the category coordinate in the Dataset.

shoyer · 2020-12-16T20:33:08Z

Ugh, this is unfortunate! Thanks for the clear example code. Coincidentally, one of collaborators ran into this same bug this morning. This sort of "corrupted data" bug is one of the nastiest types, so we should definitely try to prioritize a fix.

keewis · 2020-12-21T23:22:11Z

this is caused by the use of np.unique here:

xarray/xarray/plot/dataset_plot.py

Line 425 in de3f275

for label in np.unique(data["hue"].values):

to fix that, I think we either need to find a way to preserve the order of data["hue"] (the output of np.unique is sorted), or we have to use sorted/np.unique here:

xarray/xarray/plot/facetgrid.py

Line 384 in de3f275

labels=list(self._hue_var.values),

ahuang11 · 2020-12-22T03:55:33Z

Maybe a simple fix would be to replace np.unique with pd.unique since it's ordered?

Hash table-based unique. Uniques are returned in order of appearance. This does NOT sort.

Significantly faster than numpy.unique. Includes NA values.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.unique.html

shoyer added bug topic-plotting labels Dec 16, 2020

keewis mentioned this issue Dec 22, 2020

scatter plot by order of the first appearance of hue #4723

Merged

4 tasks

keewis closed this as completed in #4723 Jan 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong hue assignment in scatter plot #4641

Wrong hue assignment in scatter plot #4641

astoeriko commented Dec 2, 2020

INSTALLED VERSIONS

astoeriko commented Dec 2, 2020

astoeriko commented Dec 2, 2020

INSTALLED VERSIONS

shoyer commented Dec 3, 2020

astoeriko commented Dec 3, 2020 •

edited

Loading

shoyer commented Dec 16, 2020

keewis commented Dec 21, 2020

ahuang11 commented Dec 22, 2020

Wrong hue assignment in scatter plot #4641

Wrong hue assignment in scatter plot #4641

Comments

astoeriko commented Dec 2, 2020

INSTALLED VERSIONS

astoeriko commented Dec 2, 2020

astoeriko commented Dec 2, 2020

INSTALLED VERSIONS

shoyer commented Dec 3, 2020

astoeriko commented Dec 3, 2020 • edited Loading

shoyer commented Dec 16, 2020

keewis commented Dec 21, 2020

ahuang11 commented Dec 22, 2020

astoeriko commented Dec 3, 2020 •

edited

Loading