ExtensionArray being checked if is instance of collections.abc.Sequence #28424

fpim · 2019-09-13T02:09:27Z

from pandas.api.extensions import ExtensionDtype,ExtensionArray, ExtensionScalarOpsMixin
class FP:

    def __add__(self, other):
        return self + other
    pass
@pd.api.extensions.register_extension_dtype
class FPType(ExtensionDtype):
    name = 'fp'
    type = FP
    @classmethod
    def construct_from_string(cls, string):
        if string == cls.name:
            return cls()
        else:
            raise TypeError("Cannot construct a '{}' from "
                            "'{}'".format(cls, string))

# collections.abc.Sequence -> data is sequence, _reduce is possible
class FPArray(ExtensionArray,ExtensionScalarOpsMixin,collections.abc.Sequence):
    _dtype = FPType()
    @property
    def dtype(self):
        return self._dtype

    def __init__(self, values):
        values = np.array(values)  # TODO: avoid potential copy
        self.data = values

    def __len__(self):
        return len(self.data)
    def __getitem__(self, *args):
        result = operator.getitem(self.data, *args)
        return result


    def take(self,indexer):
        return self.data.take(indexer)

    def __contains__(self,item):
        return item in self.data

    def __iter__(self):
        for i in self.data:
            yield i

    def __reversed__(self):
        return reversed(self.data)

    def _reduce(self, name, skipna=True, **kwargs):
        return sum(self.data+100)

The ExtensionArray array itself is just for illustration and is not the problem, the problematic part is

arr = FPArray([FP(),FP(),FP(),FP()])
df = pd.DataFrame(arr )

when it reached line 444 of pandas/core/frame.py (https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py#L444)

            if not isinstance(data, abc.Sequence):
                data = list(data)

where data is referring to my FPArray, if data is not abc.Sequence, my ExtensionArray will be casted to a list.

Having my ExtensionArray casted to a list mean my ExtensionArray will eventually casted back to a normal Pandas series.

The Solution should be having ExtensionArray to be a subclass of abc.Sequence.

INSTALLED VERSIONS

commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.25.1
numpy : 1.17.2
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.2.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-09-13T02:42:59Z

I'm confused. You're passing an ExtensionArray to the DataFrame constructor? Does it work if you pass a dict DataFrame({"A": arr})?

fpim · 2019-09-13T03:36:07Z

I'm confused. You're passing an ExtensionArray to the DataFrame constructor? Does it work if you pass a dict DataFrame({"A": arr})?

That would behave the same.

I understand the confusion, the most observable side effect of the problem would be:
If I do

class FPArray(ExtensionArray,ExtensionScalarOpsMixin,collections.abc.Sequence):
    def _reduce(self, name, skipna=True, **kwargs):
        return 'xxxxx'
arr = FPArray([FP(),FP(),FP(),FP()])
df = pd.DataFrame(arr )
df[0].sum()

def _reduce can be reached

while if i do :

class FPArray(ExtensionArray,ExtensionScalarOpsMixin):
    def _reduce(self, name, skipna=True, **kwargs):
        return 'xxxxx'
arr = FPArray([FP(),FP(),FP(),FP()])
df = pd.DataFrame(arr )
df[0].sum()

def _reduce cannot be reached

jorisvandenbossche · 2019-09-13T09:40:10Z

@TomAugspurger I alsways forget whether it creates a row or column, but passing a single list/array-like to DataFrame constructor creates a DataFrame with one column:

In [2]: pd.DataFrame([1, 2, 3])                                                                                                                                                                                    
Out[2]: 
   0
0  1
1  2
2  3

Since that is a way to pass data to DataFrame, you can indeed expect that passing an ExtensionArray also preserves the type, which it currently doesn't:

In [3]: pd.DataFrame(pd.array([1, 2, None], dtype='Int64'))                                                                                                                                                        
Out[3]: 
     0
0  1.0
1  2.0
2  NaN

So I think we can consider this a bug.
Patches welcome!

simonjayhawkins · 2019-09-13T11:11:15Z

The Solution should be having ExtensionArray to be a subclass of abc.Sequence.

that does not seem to work since that code block doesn't preserve dtypes. However, the following seems to work.

diff --git a/pandas/core/frame.py b/pandas/core/frame.py
index f1ed3a125..dc72caa5a 100644
--- a/pandas/core/frame.py
+++ b/pandas/core/frame.py
@@ -427,7 +427,7 @@ class DataFrame(NDFrame):
                     data = data.copy()
                 mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)

-        elif isinstance(data, (np.ndarray, Series, Index)):
+        elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):
             if data.dtype.names:
                 data_columns = list(data.dtype.names)
                 data = {k: data[k] for k in data_columns}

>>> import pandas as pd
>>> pd.DataFrame(pd.array([1, 2, None], dtype='Int64')).dtypes
0    Int64
dtype: object
>>>

TomAugspurger · 2019-09-13T11:33:19Z

I alsways forget whether it creates a row or column, but passing a single list/array-like to DataFrame constructor creates a DataFrame with one column:

Huh, I would not have guessed that.

Anyway, looks like Simon has a fix.

We wouldn't want to inherit from abc.Sequence, since isinstance checks on ABCs are so much slower.

louisng5 · 2019-09-13T13:26:09Z

I alsways forget whether it creates a row or column, but passing a single list/array-like to DataFrame constructor creates a DataFrame with one column:

Huh, I would not have guessed that.

Anyway, looks like Simon has a fix.

We wouldn't want to inherit from abc.Sequence, since isinstance checks on ABCs are so much slower.

I totally understand the fix. This will also change the behavior of all existing ExtensionArrays by having them fall into the elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)): block.

arw2019 · 2020-11-14T03:21:43Z

This works on 1.2 master

jorisvandenbossche added Bug Constructors Series/DataFrame/Index/pd.array Constructors ExtensionArray Extending pandas with custom dtypes or arrays. labels Sep 13, 2019

jorisvandenbossche added this to the Contributions Welcome milestone Sep 13, 2019

arw2019 added Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Nov 14, 2020

arw2019 mentioned this issue Nov 20, 2020

TST: add nullable array frame constructor dtype tests #37972

Merged

5 tasks

jreback modified the milestones: Contributions Welcome, 1.2 Nov 20, 2020

jreback closed this as completed in #37972 Nov 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExtensionArray being checked if is instance of collections.abc.Sequence #28424

ExtensionArray being checked if is instance of collections.abc.Sequence #28424

fpim commented Sep 13, 2019 •

edited

Loading

INSTALLED VERSIONS

TomAugspurger commented Sep 13, 2019

fpim commented Sep 13, 2019 •

edited

Loading

jorisvandenbossche commented Sep 13, 2019

simonjayhawkins commented Sep 13, 2019

TomAugspurger commented Sep 13, 2019

louisng5 commented Sep 13, 2019 •

edited

Loading

arw2019 commented Nov 14, 2020

ExtensionArray being checked if is instance of collections.abc.Sequence #28424

ExtensionArray being checked if is instance of collections.abc.Sequence #28424

Comments

fpim commented Sep 13, 2019 • edited Loading

INSTALLED VERSIONS

TomAugspurger commented Sep 13, 2019

fpim commented Sep 13, 2019 • edited Loading

jorisvandenbossche commented Sep 13, 2019

simonjayhawkins commented Sep 13, 2019

TomAugspurger commented Sep 13, 2019

louisng5 commented Sep 13, 2019 • edited Loading

arw2019 commented Nov 14, 2020

fpim commented Sep 13, 2019 •

edited

Loading

fpim commented Sep 13, 2019 •

edited

Loading

louisng5 commented Sep 13, 2019 •

edited

Loading