-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Fix pd.merge to preserve ExtensionArrays dtypes #20745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix pd.merge to preserve ExtensionArrays dtypes #20745
Conversation
pandas/core/internals.py
Outdated
@@ -5541,7 +5541,7 @@ def concatenate_join_units(join_units, concat_axis, copy): | |||
if len(to_concat) == 1: | |||
# Only one block, nothing to concatenate. | |||
concat_values = to_concat[0] | |||
if copy and concat_values.base is not None: | |||
if copy and getattr(concat_values, 'base', 1) is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of a hack, would like to have a better solution.
(related to discussion earlier this day in #20721 (comment) about deprecating Index.base)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should only check base if concat_values is an ndarray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback updated
Note to self: 1 actual failing test (rest are network failures):
Not seeing this locally, and also not failing for Decimal, so the column order seems to depend on the extension dtype? (which is a bit strange) |
pandas/core/internals.py
Outdated
@@ -5541,7 +5541,7 @@ def concatenate_join_units(join_units, concat_axis, copy): | |||
if len(to_concat) == 1: | |||
# Only one block, nothing to concatenate. | |||
concat_values = to_concat[0] | |||
if copy and concat_values.base is not None: | |||
if copy and getattr(concat_values, 'base', 1) is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should only check base if concat_values is an ndarray
@@ -95,3 +95,24 @@ def test_set_frame_overwrite_object(self, data): | |||
df = pd.DataFrame({"A": [1] * len(data)}, dtype=object) | |||
df['A'] = data | |||
assert df.dtypes['A'] == data.dtype | |||
|
|||
def test_merge(self, data, na_value): | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue number
I can reproduce your failure locally. Looking into it now. |
Ah, bug in So the test is broken for all types, not just JSON. |
The Decimal base class failed to check the global order of columns. Fixed that as well.
Pushed a commit fixing the Decimal tests base class and adjust the expected order. Hopefully this will pass things. |
Codecov Report
@@ Coverage Diff @@
## master #20745 +/- ##
==========================================
+ Coverage 91.84% 91.85% +<.01%
==========================================
Files 153 153
Lines 49303 49307 +4
==========================================
+ Hits 45282 45289 +7
+ Misses 4021 4018 -3
Continue to review full report at Codecov.
|
Thanks for the fix! The order was indeed the culprit, but changed it again to be alphabetically in the original test frame, so it behaves the same for python < 3.6 |
For Travis, the remaining failure is an s3 one, so this seems to be passing now. |
@@ -95,3 +95,24 @@ def test_set_frame_overwrite_object(self, data): | |||
df = pd.DataFrame({"A": [1] * len(data)}, dtype=object) | |||
df['A'] = data | |||
assert df.dtypes['A'] == data.dtype | |||
|
|||
def test_merge(self, data, na_value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you prob should test with with the how=join_type
fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They both need a different expected result, I don't think it is really worth it here in this case?
(the test is also not meant to be a full cover of the merge function (for that we already have other tests), just to test that basic use cases of concatting works with extension arrays)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, its worth doing way more tests than a single usecase. but ok here I guess.
Closes #20743