Skip to content

Test column addition using .at with different values #46731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

FactorizeD
Copy link
Contributor

@@ -16,16 +16,6 @@
import pandas._testing as tm


def test_at_timezone():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test could be incorporated into test_at_setitem_expansion, so I removed it, I hope that it is fine

@@ -47,6 +37,67 @@ def test_selection_methods_of_assigned_col():
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize(
Copy link
Contributor Author

@FactorizeD FactorizeD Apr 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment - #30649 issue talks exclusively about assigning a list of values using .at. I went through this file though and haven't found a general test that makes sure that other values (e.g. tuples, strings, nans etc.) work as expected, so I thought that I will create such a general test for both df and series (including the list, of course). I was thinking of some sort of parametrization (so that a single test function would take either a df or a series) but concluded that making an explicit df / series test might be easier to read.

If you think that this approach is fine, then is such an iteration over possible values to assign fine, or is there maybe some fixture already defined elsewhere with different examples of each possible data type? I tried to search for a few keywords in the repo but couldn't find anything helpful

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expansion of tests like this is great!

True,
("a",),
["a"],
datetime(2000, 1, 2, tzinfo=timezone.utc),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you can see the list of value_to_assign for a Series contains more items (in particular a dictionary, a numpy array and series). I tried to use them for a dataframe as well but was gettings errors. In particular:

  • dictionary, series: ValueError: Incompatible indexer with Series
  • numpy array: IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

I haven't looked more deeply in the code, but is this expected or should a bug be raised for it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is likely we should not trying to treat iterables like this, so could be a bug. if you can open a separate issue for this (and PR would be great too!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean exactly by not trying to treat iterables like this? Do you mean that dictionaries, series, and numpy arrays shouldn't be assigned to dataframes / series?

@jreback jreback added Testing pandas testing functions or related to the test suite Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions labels Apr 10, 2022
@jreback jreback added this to the 1.5 milestone Apr 10, 2022
@FactorizeD FactorizeD force-pushed the tst-indexing-at-setter branch from 4906755 to 52ad05a Compare April 12, 2022 21:07
@FactorizeD FactorizeD force-pushed the tst-indexing-at-setter branch from 52ad05a to bbd1ea9 Compare April 14, 2022 19:15
@FactorizeD
Copy link
Contributor Author

@jreback I am tagging you since you added a few labels to the PR a couple of days ago. Failing check seems not related to this PR. Is there something more that can be done here from my side at the moment or should I simply wait for the review?

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay. Could you merge main one more time?

@@ -47,6 +37,67 @@ def test_selection_methods_of_assigned_col():
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expansion of tests like this is great!

True,
("a",),
["a"],
datetime(2000, 1, 2, tzinfo=timezone.utc),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is likely we should not trying to treat iterables like this, so could be a bug. if you can open a separate issue for this (and PR would be great too!)

Series([1]),
],
)
@pytest.mark.filterwarnings("ignore::FutureWarning")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is showing the warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following warnings are present (FutureWarning was here earlier; UserWarning occurred just now, not sure if this is because of my setup or something else changed in the meantime?):

  • FutureWarning: Behavior when concatenating bool-dtype and numeric-dtype arrays is deprecated; in a future version these will cast to object dtype (instead of coercing bools to numeric values). To retain the old behavior, explicitly cast bool-dtype arrays to numeric dtype. (related to the TODO comment)
  • UserWarning: pyarrow requires pandas 0.23.0 or above, pandas 0+unknown is installed. Therefore, pandas-specific integration is not used.

# numeric-dtype is cast to object
if isinstance(value_to_assign, bool):
result = result.astype(object)
expected_result = Series([1, 2, value_to_assign], index=[3, 2, 1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just called this expected

if isinstance(value_to_assign, bool):
result = result.astype(object)
expected_result = Series([1, 2, value_to_assign], index=[3, 2, 1])
if isinstance(value_to_assign, datetime):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm this is a bug if it needs casting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't remember exactly why I added it in the first place, but it is indeed working without the expected_result cast. The first one is needed though for the case where True is being assigned to the series - it gets converted to 1 automatically when Series has int64 dtype in the first place. Is that actually expected?

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jun 10, 2022
@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

@mroeschke mroeschke closed this Jun 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves Stale Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dtype does not get set to object when adding column with df.at[...] = ['some list']
3 participants