Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -658,7 +658,6 @@ Other API changes
- :func: `pandas.api.dtypes.is_string_dtype` no longer incorrectly identifies categorical series as string.
- :func:`read_excel` no longer takes ``**kwds`` arguments. This means that passing in keyword ``chunksize`` now raises a ``TypeError``
(previously raised a ``NotImplementedError``), while passing in keyword ``encoding`` now raises a ``TypeError`` (:issue:`34464`)
- :func: `merge` now checks ``suffixes`` parameter type to be ``tuple`` and raises ``TypeError``, whereas before a ``list`` or ``set`` were accepted and that the ``set`` could produce unexpected results (:issue:`33740`)
- :class:`Period` no longer accepts tuples for the ``freq`` argument (:issue:`34658`)
- :meth:`Series.interpolate` and :meth:`DataFrame.interpolate` now raises ValueError if ``limit_direction`` is 'forward' or 'both' and ``method`` is 'backfill' or 'bfill' or ``limit_direction`` is 'backward' or 'both' and ``method`` is 'pad' or 'ffill' (:issue:`34746`)

Expand Down Expand Up @@ -749,6 +748,7 @@ Deprecations
- :meth:`DataFrame.to_dict` has deprecated accepting short names for ``orient`` in future versions (:issue:`32515`)
- :meth:`Categorical.to_dense` is deprecated and will be removed in a future version, use ``np.asarray(cat)`` instead (:issue:`32639`)
- The ``fastpath`` keyword in the ``SingleBlockManager`` constructor is deprecated and will be removed in a future version (:issue:`33092`)
- Providing ``suffixes`` as a ``set`` in :func:`pandas.merge` is deprecated. Provide a tuple instead (:issue:`33740`, :issue:`34741`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are going to do this (I guess @jorisvandenbossche really wants this), then let's update the doc-string of pd.merge to be correct (needs list-like) and the docs in reshape.rst as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pd.merge docs already state that we want a sequence.

    suffixes : Sequence, default is ("_x", "_y")
        A length-2 sequence where

Updated the DataFrame.merge docs to be consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm we don't use sequence elsewhere do we, don't we use list-like everywhere?

- :meth:`Index.is_mixed` is deprecated and will be removed in a future version, check ``index.inferred_type`` directly instead (:issue:`32922`)

- Passing any arguments but the first one to :func:`read_html` as
Expand Down
11 changes: 7 additions & 4 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -226,10 +226,13 @@
sort : bool, default False
Sort the join keys lexicographically in the result DataFrame. If False,
the order of the join keys depends on the join type (how keyword).
suffixes : tuple of (str, str), default ('_x', '_y')
Suffix to apply to overlapping column names in the left and right
side, respectively. To raise an exception on overlapping columns use
(False, False).
suffixes : Sequence, default is ("_x", "_y")
A length-2 sequence where each element is optionally a string
indicating the suffix to add to overlapping column names in
`left` and `right` respectively. Pass a value of `None` instead
of a string to indicate that the column name from `left` or
`right` should be left as-is, with no suffix. At least one of the
values must not be None.
copy : bool, default True
If False, avoid copy if possible.
indicator : bool or str, default False
Expand Down
10 changes: 7 additions & 3 deletions pandas/core/reshape/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -2072,9 +2072,13 @@ def _items_overlap_with_suffix(left: Index, right: Index, suffixes: Tuple[str, s
If corresponding suffix is empty, the entry is simply converted to string.

"""
if not isinstance(suffixes, tuple):
raise TypeError(
f"suffixes should be tuple of (str, str). But got {type(suffixes).__name__}"
if isinstance(suffixes, set):
warnings.warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just raise? also what about a dict or non-list-like for example?

do we check is_list_like anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need a warning according to our backwards compatibility poilcy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easy to do it with a warning, so I would also just do that.

@TomAugspurger if we don't want to have it set-specific, could also do if not is_list_like(suffixes, allow_sets=False), although in practice that might give the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that should use what @jorisvandenbossche suggests here , e.g. dict would pass this test but is not acceptable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dicts are ordered though, so they shouldn't be a problem. I thought the original issue was non-deterministic outputs because sets aren't ordered?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually sets are not ordered, but I think we need an actual sequence here (sure .keys() satisfies), but i'd rather be explicit. we do not treat dicts as list-like anywhere else.

"Passing 'suffixes' as a set, which is unordered, may result in "
"unexpected results. Provide 'suffixes' as a tuple instead. In the "
"future a 'TypeError' will be raised.",
FutureWarning,
stacklevel=4,
)

to_rename = left.intersection(right)
Expand Down
16 changes: 5 additions & 11 deletions pandas/tests/reshape/merge/test_merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -2069,18 +2069,12 @@ def test_merge_suffix_error(col1, col2, suffixes):
pd.merge(a, b, left_index=True, right_index=True, suffixes=suffixes)


@pytest.mark.parametrize(
"col1, col2, suffixes", [("a", "a", {"a", "b"}), ("a", "a", None), (0, 0, None)],
)
def test_merge_suffix_type_error(col1, col2, suffixes):
a = pd.DataFrame({col1: [1, 2, 3]})
b = pd.DataFrame({col2: [3, 4, 5]})
def test_merge_suffix_set():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this non_list_like parmaterize on dict as well

a = pd.DataFrame({"a": [1, 2, 3]})
b = pd.DataFrame({"b": [3, 4, 5]})

msg = (
f"suffixes should be tuple of \\(str, str\\). But got {type(suffixes).__name__}"
)
with pytest.raises(TypeError, match=msg):
pd.merge(a, b, left_index=True, right_index=True, suffixes=suffixes)
with tm.assert_produces_warning(FutureWarning):
pd.merge(a, b, left_index=True, right_index=True, suffixes={"left", "right"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not actually passing the parameterized suffixes here, so I think that the dict case is not tested, and will also not raise a warning (as dicts are considered list-like)



@pytest.mark.parametrize(
Expand Down