Skip to content

Test column addition using .at with different values #46731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 62 additions & 10 deletions pandas/tests/indexing/test_at.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,6 @@
import pandas._testing as tm


def test_at_timezone():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test could be incorporated into test_at_setitem_expansion, so I removed it, I hope that it is fine

# https://github.com/pandas-dev/pandas/issues/33544
result = DataFrame({"foo": [datetime(2000, 1, 1)]})
result.at[0, "foo"] = datetime(2000, 1, 2, tzinfo=timezone.utc)
expected = DataFrame(
{"foo": [datetime(2000, 1, 2, tzinfo=timezone.utc)]}, dtype=object
)
tm.assert_frame_equal(result, expected)


def test_selection_methods_of_assigned_col():
# GH 29282
df = DataFrame(data={"a": [1, 2, 3], "b": [4, 5, 6]})
Expand All @@ -47,6 +37,68 @@ def test_selection_methods_of_assigned_col():
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize(
Copy link
Contributor Author

@FactorizeD FactorizeD Apr 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment - #30649 issue talks exclusively about assigning a list of values using .at. I went through this file though and haven't found a general test that makes sure that other values (e.g. tuples, strings, nans etc.) work as expected, so I thought that I will create such a general test for both df and series (including the list, of course). I was thinking of some sort of parametrization (so that a single test function would take either a df or a series) but concluded that making an explicit df / series test might be easier to read.

If you think that this approach is fine, then is such an iteration over possible values to assign fine, or is there maybe some fixture already defined elsewhere with different examples of each possible data type? I tried to search for a few keywords in the repo but couldn't find anything helpful

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expansion of tests like this is great!

"value_to_assign",
[
"a",
1,
1.0,
np.nan,
True,
("a",),
["a"],
datetime(2000, 1, 2, tzinfo=timezone.utc),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you can see the list of value_to_assign for a Series contains more items (in particular a dictionary, a numpy array and series). I tried to use them for a dataframe as well but was gettings errors. In particular:

  • dictionary, series: ValueError: Incompatible indexer with Series
  • numpy array: IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

I haven't looked more deeply in the code, but is this expected or should a bug be raised for it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is likely we should not trying to treat iterables like this, so could be a bug. if you can open a separate issue for this (and PR would be great too!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean exactly by not trying to treat iterables like this? Do you mean that dictionaries, series, and numpy arrays shouldn't be assigned to dataframes / series?

],
)
def test_at_setitem_expansion__df(value_to_assign):
# GH33544
# GH30649
result = DataFrame({0: [1, 2]}, index=[3, 2])

result.at[1, 1] = value_to_assign
expected_result = DataFrame(
{0: [1, 2, np.nan], 1: [np.nan, np.nan, value_to_assign]}, index=[3, 2, 1]
)
if isinstance(value_to_assign, datetime):
expected_result = expected_result.astype({1: object})

tm.assert_frame_equal(result, expected_result)


@pytest.mark.parametrize(
"value_to_assign",
[
"a",
1,
1.0,
np.nan,
True,
("a",),
["a"],
{"a": "b"},
datetime(2000, 1, 2, tzinfo=timezone.utc),
np.array([1]),
Series([1]),
],
)
@pytest.mark.filterwarnings("ignore::FutureWarning")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is showing the warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following warnings are present (FutureWarning was here earlier; UserWarning occurred just now, not sure if this is because of my setup or something else changed in the meantime?):

  • FutureWarning: Behavior when concatenating bool-dtype and numeric-dtype arrays is deprecated; in a future version these will cast to object dtype (instead of coercing bools to numeric values). To retain the old behavior, explicitly cast bool-dtype arrays to numeric dtype. (related to the TODO comment)
  • UserWarning: pyarrow requires pandas 0.23.0 or above, pandas 0+unknown is installed. Therefore, pandas-specific integration is not used.

def test_at_setitem_expansion__series(value_to_assign):
# GH33544
# GH30649
result = Series([1, 2], index=[3, 2])

result.at[1] = value_to_assign
# TODO remove if statement when concatenating bool-dtype and
# numeric-dtype is cast to object
if isinstance(value_to_assign, bool):
result = result.astype(object)
expected_result = Series([1, 2, value_to_assign], index=[3, 2, 1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just called this expected

if isinstance(value_to_assign, datetime):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm this is a bug if it needs casting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't remember exactly why I added it in the first place, but it is indeed working without the expected_result cast. The first one is needed though for the case where True is being assigned to the series - it gets converted to 1 automatically when Series has int64 dtype in the first place. Is that actually expected?

expected_result = expected_result.astype(object)

tm.assert_series_equal(result, expected_result)


class TestAtSetItem:
def test_at_setitem_item_cache_cleared(self):
# GH#22372 Note the multi-step construction is necessary to trigger
Expand Down