Skip to content

Commit e600237

Browse files
committed
whatsnew
1 parent b1913b7 commit e600237

File tree

1 file changed

+77
-0
lines changed

1 file changed

+77
-0
lines changed

doc/source/whatsnew/v1.1.0.rst

+77
Original file line numberDiff line numberDiff line change
@@ -351,6 +351,83 @@ Notable bug fixes
351351

352352
These are bug fixes that might have notable behavior changes.
353353

354+
Assigning with ``DataFrame.__setitem__`` consistently creates a new array
355+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
356+
357+
358+
Assigning values with ``DataFrame.__setitem__`` now consistently assigns a new array, rather than mutating inplace (:issue:`33457`, :issue:`35271`, :issue:`35266`)
359+
360+
Previously, ``DataFrame.__setitem__`` would sometimes operate inplace on the
361+
underlying array, and sometimes assign a new array. Fixing this inconsistency
362+
can have behavior-changing implications for workloads that relied on inplace
363+
mutation. The two most common cases are creating a ``DataFrame`` from an array
364+
and slicing a ``DataFrame``.
365+
366+
*Previous Behavior*
367+
368+
The array would be mutated inplace for some dtypes, like NumPy's ``int64`` dtype.
369+
370+
.. code-block:: python
371+
372+
>>> import pandas as pd
373+
>>> import numpy as np
374+
>>> a = np.array([1, 2, 3])
375+
>>> df = pd.DataFrame(a, columns=['a'])
376+
>>> df['a'] = 0
377+
>>> a # mutated inplace
378+
array([0, 0, 0])
379+
380+
But not others, like :class:`Int64Dtype`.
381+
382+
.. code-block:: python
383+
384+
>>> import pandas as pd
385+
>>> import numpy as np
386+
>>> a = pd.array([1, 2, 3], dtype="Int64")
387+
>>> df = pd.DataFrame(a, columns=['a'])
388+
>>> df['a'] = 0
389+
>>> a # not mutated
390+
<IntegerArray>
391+
[1, 2, 3]
392+
Length: 3, dtype: Int64
393+
394+
395+
*New Behavior*
396+
397+
In pandas 1.1.0, ``DataFrame.__setitem__`` consistently sets on a new array rather than
398+
mutating the existing array inplace.
399+
400+
.. ipython:: python
401+
402+
For NumPy's int64 dtype
403+
404+
import pandas as pd
405+
import numpy as np
406+
a = np.array([1, 2, 3])
407+
df = pd.DataFrame(a, columns=['a'])
408+
df['a'] = 0
409+
a # not mutated
410+
411+
For :class:`Int64Dtype`.
412+
413+
import pandas as pd
414+
import numpy as np
415+
a = pd.array([1, 2, 3], dtype="Int64")
416+
df = pd.DataFrame(a, columns=['a'])
417+
df['a'] = 0
418+
a # not mutated
419+
420+
This also affects cases where a second ``Series`` or ``DataFrame`` is a view on a first ``DataFrame``.
421+
422+
.. code-block:: python
423+
424+
df = pd.DataFrame({"A": [1, 2, 3]})
425+
df2 = df[['A']]
426+
df['A'] = np.array([0, 0, 0])
427+
428+
Previously, whether ``df2`` was mutated depending on the dtype of the array being assigned to. Now, a
429+
new array is consistently assigned, so ``df2`` is not mutated.
430+
354431
``MultiIndex.get_indexer`` interprets ``method`` argument correctly
355432
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
356433

0 commit comments

Comments
 (0)