Skip to content

assignment with datetime64[ns, UTC] raises TypeError #32395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Lmmejia11 opened this issue Mar 2, 2020 · 4 comments · Fixed by #32479
Closed

assignment with datetime64[ns, UTC] raises TypeError #32395

Lmmejia11 opened this issue Mar 2, 2020 · 4 comments · Fixed by #32479
Labels
Bug Datetime Datetime data dtype Regression Functionality that used to work in a prior pandas version Timezones Timezone data dtype
Milestone

Comments

@Lmmejia11
Copy link

Code Sample

# Your code here

import datetime, pandas as pd

data = [[datetime.datetime(2020, 2, 28, 13, 51, 27, tzinfo=datetime.timezone.utc)],
 [datetime.datetime(2020, 2, 28, 13, 51, 27, tzinfo=datetime.timezone.utc)],
 [datetime.datetime(2020, 2, 28, 13, 51, 27, tzinfo=datetime.timezone.utc)],
 [datetime.datetime(2020, 2, 28, 13, 51, 27, tzinfo=datetime.timezone.utc)]]
df = pd.DataFrame(data, columns=['cd'])     # dtype: datetime64[ns, UTC]
df2 = pd.DataFrame(index=df.index)

df2.loc[:,'cd2'] = df['cd']                  # no error
df2.loc[df.index,'cd3'] = df['cd']     # error
df2.loc[df.index,'cd2'] = df['cd']     # no error

Problem description

If you try to assign datetime values (with zone and indexes) to a column, it will raise TypeError: data type not understood.
No errors raise with index ':', or when the column already has the correct type. Note that this only happens if the datetime has zone information. With tzinfo=None, no errors occur.

Output of pd.show_versions()

I noticed this bug in version 1.0.1
No errors occur in 0.25.3

INSTALLED VERSIONS

commit : None
pandas : 1.0.1
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 20.0.1
setuptools : 45.1.0
Cython : None
pytest : None
hypothesis : None
sphinx : 2.4.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.2 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : None
fastparquet : None
gcsfs : 0.6.0
lxml.etree : None
matplotlib : 3.1.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.2.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

@TomAugspurger
Copy link
Contributor

Thanks for the report. Things look weird around

# current dtype cannot store value, coerce to common dtype
find_dtype = False
if hasattr(value, "dtype"):
dtype = value.dtype
find_dtype = True
elif lib.is_scalar(value) and not isna(value):
dtype, _ = infer_dtype_from_scalar(value, pandas_dtype=True)
find_dtype = True
if find_dtype:
dtype = find_common_type([values.dtype, dtype])
if not is_dtype_equal(self.dtype, dtype):
b = self.astype(dtype)
return b.setitem(indexer, value)
,

(Pdb) pp self
DatetimeBlock: 4 dtype: datetime64[ns]
(Pdb) pp value
<DatetimeArray>
['2020-02-28 13:51:27+00:00', '2020-02-28 13:51:27+00:00',
 '2020-02-28 13:51:27+00:00', '2020-02-28 13:51:27+00:00']
Length: 4, dtype: datetime64[ns, UTC]
(Pdb) pp dtype
<class 'object'>
(Pdb) pp self.values
array(['NaT', 'NaT', 'NaT', 'NaT'], dtype='datetime64[ns]')

I see a couple options, both of which are likely valuable.

  1. Ensure we create a datetime64[ns, <tz>] array of all-NA values, where the tz matches the target dtype
  2. Ensure that Block.setitem correctly handles datetime64[ns] -> datetime64[ns, tz].

Are you interested in working on this?

@TomAugspurger TomAugspurger added Datetime Datetime data dtype Timezones Timezone data dtype labels Mar 2, 2020
@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Mar 2, 2020
@jorisvandenbossche jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Mar 2, 2020
@jorisvandenbossche jorisvandenbossche modified the milestones: Contributions Welcome, 1.0.2 Mar 2, 2020
@jorisvandenbossche
Copy link
Member

(tagged this as a regression)

@h-vishal
Copy link
Contributor

h-vishal commented Mar 5, 2020

In 0.25 the DatetimeArray value to be assigned was coerced to an array of Timestamp values when it was assigned to an object dtype, I think that that part of the code can be reintroduced.

    def _try_coerce_args(self, other):
        """ provide coercion to our input arguments """

        if isinstance(other, ABCDatetimeIndex):
            # May get a DatetimeIndex here. Unbox it.
            other = other.array

        if isinstance(other, DatetimeArray):
            # hit in pandas/tests/indexing/test_coercion.py
            # ::TestWhereCoercion::test_where_series_datetime64[datetime64tz]
            # when falling back to ObjectBlock.where
            other = other.astype(object)

        return other

I can work on this.

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Mar 26, 2020

In 0.25 the DatetimeArray value to be assigned was coerced to an array of Timestamp values when it was assigned to an object dtype, I think that that part of the code can be reintroduced.

    def _try_coerce_args(self, other):
        """ provide coercion to our input arguments """

can confirm this is the cause of the regression, xref #29139 (regression in 1.0.0)

225cc92 is the first bad commit
commit 225cc92
Author: jbrockmendel [email protected]
Date: Thu Oct 24 05:10:07 2019 -0700

CLN: remove Block._try_coerce_arg (#29139)

@jbrockmendel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Regression Functionality that used to work in a prior pandas version Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants