-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Test column addition using .at with different values #46731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
FactorizeD
commented
Apr 10, 2022
- closes dtype does not get set to object when adding column with df.at[...] = ['some list'] #30649
@@ -16,16 +16,6 @@ | |||
import pandas._testing as tm | |||
|
|||
|
|||
def test_at_timezone(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test could be incorporated into test_at_setitem_expansion
, so I removed it, I hope that it is fine
@@ -47,6 +37,67 @@ def test_selection_methods_of_assigned_col(): | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
|
|||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment - #30649 issue talks exclusively about assigning a list of values using .at
. I went through this file though and haven't found a general test that makes sure that other values (e.g. tuples, strings, nans etc.) work as expected, so I thought that I will create such a general test for both df and series (including the list, of course). I was thinking of some sort of parametrization (so that a single test function would take either a df or a series) but concluded that making an explicit df / series test might be easier to read.
If you think that this approach is fine, then is such an iteration over possible values to assign fine, or is there maybe some fixture already defined elsewhere with different examples of each possible data type? I tried to search for a few keywords in the repo but couldn't find anything helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expansion of tests like this is great!
True, | ||
("a",), | ||
["a"], | ||
datetime(2000, 1, 2, tzinfo=timezone.utc), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you can see the list of value_to_assign
for a Series contains more items (in particular a dictionary, a numpy array and series). I tried to use them for a dataframe as well but was gettings errors. In particular:
- dictionary, series:
ValueError: Incompatible indexer with Series
- numpy array:
IndexError: only integers, slices (
:), ellipsis (
...), numpy.newaxis (
None) and integer or boolean arrays are valid indices
I haven't looked more deeply in the code, but is this expected or should a bug be raised for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is likely we should not trying to treat iterables like this, so could be a bug. if you can open a separate issue for this (and PR would be great too!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean exactly by not trying to treat iterables like this
? Do you mean that dictionaries, series, and numpy arrays shouldn't be assigned to dataframes / series?
4906755
to
52ad05a
Compare
52ad05a
to
bbd1ea9
Compare
@jreback I am tagging you since you added a few labels to the PR a couple of days ago. Failing check seems not related to this PR. Is there something more that can be done here from my side at the moment or should I simply wait for the review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay. Could you merge main one more time?
@@ -47,6 +37,67 @@ def test_selection_methods_of_assigned_col(): | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
|
|||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expansion of tests like this is great!
True, | ||
("a",), | ||
["a"], | ||
datetime(2000, 1, 2, tzinfo=timezone.utc), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is likely we should not trying to treat iterables like this, so could be a bug. if you can open a separate issue for this (and PR would be great too!)
Series([1]), | ||
], | ||
) | ||
@pytest.mark.filterwarnings("ignore::FutureWarning") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is showing the warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following warnings are present (FutureWarning was here earlier; UserWarning occurred just now, not sure if this is because of my setup or something else changed in the meantime?):
- FutureWarning:
Behavior when concatenating bool-dtype and numeric-dtype arrays is deprecated; in a future version these will cast to object dtype (instead of coercing bools to numeric values). To retain the old behavior, explicitly cast bool-dtype arrays to numeric dtype.
(related to the TODO comment) - UserWarning:
pyarrow requires pandas 0.23.0 or above, pandas 0+unknown is installed. Therefore, pandas-specific integration is not used.
# numeric-dtype is cast to object | ||
if isinstance(value_to_assign, bool): | ||
result = result.astype(object) | ||
expected_result = Series([1, 2, value_to_assign], index=[3, 2, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just called this expected
if isinstance(value_to_assign, bool): | ||
result = result.astype(object) | ||
expected_result = Series([1, 2, value_to_assign], index=[3, 2, 1]) | ||
if isinstance(value_to_assign, datetime): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm this is a bug if it needs casting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't remember exactly why I added it in the first place, but it is indeed working without the expected_result cast. The first one is needed though for the case where True
is being assigned to the series - it gets converted to 1 automatically when Series has int64
dtype in the first place. Is that actually expected?
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |