Skip to content

Unique function output doesn't match Pandas with single unique value #2566

Closed
@richardlin047

Description

@richardlin047

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS 10.15.7
  • Modin version (modin.__version__): 0.8.2
  • Python version: 3.8.5
  • Code we can use to reproduce:
import pandas as pd
print("===PANDAS===")
s = pd.Series(['green'])
print(s)
print(type(s))
su = s.unique()
# Leads to same error as modin's unique()
# su = s.unique().squeeze()
print(su)
print(type(su))
print(len(su))

import modin.pandas as md
print("\n===MODIN===")
s = md.Series(['green'])
print(s)
print(type(s))
su = s.unique()
print(su)
print(type(su))
print(len(su))

Describe the problem

Whenever unique is called on a Series and there is only one unique value, Modin will output a scalar numpy value whereas Pandas will output an numpy array of length 1. As a result, trying to call len on Modin's unique result throws an error because scalar values do not have an len attribute, but Pandas does not. This is likely because Modin's implementation calls squeeze as squeezing an array of length 1 transforms it into a scalar.

This error does not occur when there are two or more unique values. The solution could be to remove squeeze from Modin's unique implementation. I will do more testing and try to follow up with a PR.

Source code / logs

Log from above code to reproduce:

===PANDAS===
0    green
dtype: object
<class 'pandas.core.series.Series'>
['green']
<class 'numpy.ndarray'>
1

===MODIN===
0    green
dtype: object
<class 'modin.pandas.series.Series'>
green
<class 'numpy.ndarray'>
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-c4f0aa247643> in <module>
     19 print(su)
     20 print(type(su))
---> 21 print(len(su))

TypeError: len() of unsized object

Source code for Modin's unique (calls squeeze after to_numpy):

modin/modin/pandas/series.py

Lines 1347 to 1348 in c86422a

def unique(self):
return self._query_compiler.unique().to_numpy().squeeze()

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug 🦗Something isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions