diff --git a/doc/source/development/policies.rst b/doc/source/development/policies.rst index ced5b686b8246..f8e6bda2085d8 100644 --- a/doc/source/development/policies.rst +++ b/doc/source/development/policies.rst @@ -35,7 +35,7 @@ We will not introduce new deprecations in patch releases. Deprecations will only be enforced in **major** releases. For example, if a behavior is deprecated in pandas 1.2.0, it will continue to work, with a warning, for all releases in the 1.x series. The behavior will change and the -deprecation removed in the next next major release (2.0.0). +deprecation removed in the next major release (2.0.0). .. note:: diff --git a/doc/source/user_guide/dsintro.rst b/doc/source/user_guide/dsintro.rst index 905877cca61db..f2bb99dd2ebc0 100644 --- a/doc/source/user_guide/dsintro.rst +++ b/doc/source/user_guide/dsintro.rst @@ -439,7 +439,7 @@ Data Classes as introduced in `PEP557 can be passed into the DataFrame constructor. Passing a list of dataclasses is equivalent to passing a list of dictionaries. -Please be aware, that that all values in the list should be dataclasses, mixing +Please be aware, that all values in the list should be dataclasses, mixing types in the list would result in a TypeError. .. ipython:: python diff --git a/doc/source/user_guide/integer_na.rst b/doc/source/user_guide/integer_na.rst index be38736f493b5..2d5673fe53be3 100644 --- a/doc/source/user_guide/integer_na.rst +++ b/doc/source/user_guide/integer_na.rst @@ -117,7 +117,7 @@ dtype if needed. # coerce when needed s + 0.01 -These dtypes can operate as part of of ``DataFrame``. +These dtypes can operate as part of ``DataFrame``. .. ipython:: python diff --git a/doc/source/whatsnew/v0.12.0.rst b/doc/source/whatsnew/v0.12.0.rst index 4de76510c6bc1..c12adb2f1334f 100644 --- a/doc/source/whatsnew/v0.12.0.rst +++ b/doc/source/whatsnew/v0.12.0.rst @@ -419,7 +419,7 @@ Bug fixes ~~~~~~~~~ - Plotting functions now raise a ``TypeError`` before trying to plot anything - if the associated objects have have a dtype of ``object`` (:issue:`1818`, + if the associated objects have a dtype of ``object`` (:issue:`1818`, :issue:`3572`, :issue:`3911`, :issue:`3912`), but they will try to convert object arrays to numeric arrays if possible so that you can still plot, for example, an object array with floats. This happens before any drawing takes place which @@ -430,8 +430,8 @@ Bug fixes - ``Series.str`` now supports iteration (:issue:`3638`). You can iterate over the individual elements of each string in the ``Series``. Each iteration yields - yields a ``Series`` with either a single character at each index of the - original ``Series`` or ``NaN``. For example, + a ``Series`` with either a single character at each index of the original + ``Series`` or ``NaN``. For example, .. ipython:: python :okwarning: diff --git a/doc/source/whatsnew/v0.14.0.rst b/doc/source/whatsnew/v0.14.0.rst index 5b279a4973963..b59938a9b9c9b 100644 --- a/doc/source/whatsnew/v0.14.0.rst +++ b/doc/source/whatsnew/v0.14.0.rst @@ -923,7 +923,7 @@ Bug fixes - ``HDFStore.select_as_multiple`` handles start and stop the same way as ``select`` (:issue:`6177`) - ``HDFStore.select_as_coordinates`` and ``select_column`` works with a ``where`` clause that results in filters (:issue:`6177`) - Regression in join of non_unique_indexes (:issue:`6329`) -- Issue with groupby ``agg`` with a single function and a a mixed-type frame (:issue:`6337`) +- Issue with groupby ``agg`` with a single function and a mixed-type frame (:issue:`6337`) - Bug in ``DataFrame.replace()`` when passing a non- ``bool`` ``to_replace`` argument (:issue:`6332`) - Raise when trying to align on different levels of a MultiIndex assignment (:issue:`3738`) diff --git a/doc/source/whatsnew/v0.15.2.rst b/doc/source/whatsnew/v0.15.2.rst index 95ca925f18692..b5b25796fea73 100644 --- a/doc/source/whatsnew/v0.15.2.rst +++ b/doc/source/whatsnew/v0.15.2.rst @@ -136,7 +136,7 @@ Enhancements - Added ability to export Categorical data to Stata (:issue:`8633`). See :ref:`here ` for limitations of categorical variables exported to Stata data files. - Added flag ``order_categoricals`` to ``StataReader`` and ``read_stata`` to select whether to order imported categorical data (:issue:`8836`). See :ref:`here ` for more information on importing categorical variables from Stata data files. -- Added ability to export Categorical data to to/from HDF5 (:issue:`7621`). Queries work the same as if it was an object array. However, the ``category`` dtyped data is stored in a more efficient manner. See :ref:`here ` for an example and caveats w.r.t. prior versions of pandas. +- Added ability to export Categorical data to/from HDF5 (:issue:`7621`). Queries work the same as if it was an object array. However, the ``category`` dtyped data is stored in a more efficient manner. See :ref:`here ` for an example and caveats w.r.t. prior versions of pandas. - Added support for ``searchsorted()`` on ``Categorical`` class (:issue:`8420`). Other enhancements: diff --git a/doc/source/whatsnew/v0.16.1.rst b/doc/source/whatsnew/v0.16.1.rst index 39767684c01d0..269854111373f 100644 --- a/doc/source/whatsnew/v0.16.1.rst +++ b/doc/source/whatsnew/v0.16.1.rst @@ -6,7 +6,7 @@ Version 0.16.1 (May 11, 2015) {{ header }} -This is a minor bug-fix release from 0.16.0 and includes a a large number of +This is a minor bug-fix release from 0.16.0 and includes a large number of bug fixes along several new features, enhancements, and performance improvements. We recommend that all users upgrade to this version. @@ -72,7 +72,7 @@ setting the index of a ``DataFrame/Series`` with a ``category`` dtype would conv Out[4]: Index(['c', 'a', 'b'], dtype='object') -setting the index, will create create a ``CategoricalIndex`` +setting the index, will create a ``CategoricalIndex`` .. code-block:: ipython diff --git a/doc/source/whatsnew/v0.16.2.rst b/doc/source/whatsnew/v0.16.2.rst index 194bb61f2c1c8..37e8c64ea9ced 100644 --- a/doc/source/whatsnew/v0.16.2.rst +++ b/doc/source/whatsnew/v0.16.2.rst @@ -6,7 +6,7 @@ Version 0.16.2 (June 12, 2015) {{ header }} -This is a minor bug-fix release from 0.16.1 and includes a a large number of +This is a minor bug-fix release from 0.16.1 and includes a large number of bug fixes along some new features (:meth:`~DataFrame.pipe` method), enhancements, and performance improvements. We recommend that all users upgrade to this version. diff --git a/doc/source/whatsnew/v0.18.0.rst b/doc/source/whatsnew/v0.18.0.rst index 636414cdab8d8..829c04dac9f2d 100644 --- a/doc/source/whatsnew/v0.18.0.rst +++ b/doc/source/whatsnew/v0.18.0.rst @@ -610,7 +610,7 @@ Subtraction by ``Timedelta`` in a ``Series`` by a ``Timestamp`` works (:issue:`1 pd.Timestamp('2012-01-01') - ser -``NaT.isoformat()`` now returns ``'NaT'``. This change allows allows +``NaT.isoformat()`` now returns ``'NaT'``. This change allows ``pd.Timestamp`` to rehydrate any timestamp like object from its isoformat (:issue:`12300`). diff --git a/doc/source/whatsnew/v0.20.0.rst b/doc/source/whatsnew/v0.20.0.rst index 8ae5ea5726fe9..6239c37174534 100644 --- a/doc/source/whatsnew/v0.20.0.rst +++ b/doc/source/whatsnew/v0.20.0.rst @@ -1167,7 +1167,7 @@ Other API changes - ``.loc`` has compat with ``.ix`` for accepting iterators, and NamedTuples (:issue:`15120`) - ``interpolate()`` and ``fillna()`` will raise a ``ValueError`` if the ``limit`` keyword argument is not greater than 0. (:issue:`9217`) - ``pd.read_csv()`` will now issue a ``ParserWarning`` whenever there are conflicting values provided by the ``dialect`` parameter and the user (:issue:`14898`) -- ``pd.read_csv()`` will now raise a ``ValueError`` for the C engine if the quote character is larger than than one byte (:issue:`11592`) +- ``pd.read_csv()`` will now raise a ``ValueError`` for the C engine if the quote character is larger than one byte (:issue:`11592`) - ``inplace`` arguments now require a boolean value, else a ``ValueError`` is thrown (:issue:`14189`) - ``pandas.api.types.is_datetime64_ns_dtype`` will now report ``True`` on a tz-aware dtype, similar to ``pandas.api.types.is_datetime64_any_dtype`` - ``DataFrame.asof()`` will return a null filled ``Series`` instead the scalar ``NaN`` if a match is not found (:issue:`15118`) @@ -1663,11 +1663,11 @@ Indexing - Bug in ``.reset_index()`` when an all ``NaN`` level of a ``MultiIndex`` would fail (:issue:`6322`) - Bug in ``.reset_index()`` when raising error for index name already present in ``MultiIndex`` columns (:issue:`16120`) - Bug in creating a ``MultiIndex`` with tuples and not passing a list of names; this will now raise ``ValueError`` (:issue:`15110`) -- Bug in the HTML display with with a ``MultiIndex`` and truncation (:issue:`14882`) +- Bug in the HTML display with a ``MultiIndex`` and truncation (:issue:`14882`) - Bug in the display of ``.info()`` where a qualifier (+) would always be displayed with a ``MultiIndex`` that contains only non-strings (:issue:`15245`) - Bug in ``pd.concat()`` where the names of ``MultiIndex`` of resulting ``DataFrame`` are not handled correctly when ``None`` is presented in the names of ``MultiIndex`` of input ``DataFrame`` (:issue:`15787`) - Bug in ``DataFrame.sort_index()`` and ``Series.sort_index()`` where ``na_position`` doesn't work with a ``MultiIndex`` (:issue:`14784`, :issue:`16604`) -- Bug in in ``pd.concat()`` when combining objects with a ``CategoricalIndex`` (:issue:`16111`) +- Bug in ``pd.concat()`` when combining objects with a ``CategoricalIndex`` (:issue:`16111`) - Bug in indexing with a scalar and a ``CategoricalIndex`` (:issue:`16123`) IO diff --git a/doc/source/whatsnew/v0.21.0.rst b/doc/source/whatsnew/v0.21.0.rst index 6035b89aa8643..1bbbbdc7e5410 100644 --- a/doc/source/whatsnew/v0.21.0.rst +++ b/doc/source/whatsnew/v0.21.0.rst @@ -50,7 +50,7 @@ Parquet is designed to faithfully serialize and de-serialize ``DataFrame`` s, su dtypes, including extension dtypes such as datetime with timezones. This functionality depends on either the `pyarrow `__ or `fastparquet `__ library. -For more details, see see :ref:`the IO docs on Parquet `. +For more details, see :ref:`the IO docs on Parquet `. .. _whatsnew_0210.enhancements.infer_objects: diff --git a/doc/source/whatsnew/v0.24.0.rst b/doc/source/whatsnew/v0.24.0.rst index 9ef50045d5b5e..ce784231a47d2 100644 --- a/doc/source/whatsnew/v0.24.0.rst +++ b/doc/source/whatsnew/v0.24.0.rst @@ -1622,7 +1622,7 @@ Timedelta - Bug in :class:`DataFrame` with ``timedelta64[ns]`` dtype division by ``Timedelta``-like scalar incorrectly returning ``timedelta64[ns]`` dtype instead of ``float64`` dtype (:issue:`20088`, :issue:`22163`) - Bug in adding a :class:`Index` with object dtype to a :class:`Series` with ``timedelta64[ns]`` dtype incorrectly raising (:issue:`22390`) - Bug in multiplying a :class:`Series` with numeric dtype against a ``timedelta`` object (:issue:`22390`) -- Bug in :class:`Series` with numeric dtype when adding or subtracting an an array or ``Series`` with ``timedelta64`` dtype (:issue:`22390`) +- Bug in :class:`Series` with numeric dtype when adding or subtracting an array or ``Series`` with ``timedelta64`` dtype (:issue:`22390`) - Bug in :class:`Index` with numeric dtype when multiplying or dividing an array with dtype ``timedelta64`` (:issue:`22390`) - Bug in :class:`TimedeltaIndex` incorrectly allowing indexing with ``Timestamp`` object (:issue:`20464`) - Fixed bug where subtracting :class:`Timedelta` from an object-dtyped array would raise ``TypeError`` (:issue:`21980`) @@ -1868,7 +1868,7 @@ Reshaping - :func:`pandas.core.groupby.GroupBy.rank` now raises a ``ValueError`` when an invalid value is passed for argument ``na_option`` (:issue:`22124`) - Bug in :func:`get_dummies` with Unicode attributes in Python 2 (:issue:`22084`) - Bug in :meth:`DataFrame.replace` raises ``RecursionError`` when replacing empty lists (:issue:`22083`) -- Bug in :meth:`Series.replace` and :meth:`DataFrame.replace` when dict is used as the ``to_replace`` value and one key in the dict is is another key's value, the results were inconsistent between using integer key and using string key (:issue:`20656`) +- Bug in :meth:`Series.replace` and :meth:`DataFrame.replace` when dict is used as the ``to_replace`` value and one key in the dict is another key's value, the results were inconsistent between using integer key and using string key (:issue:`20656`) - Bug in :meth:`DataFrame.drop_duplicates` for empty ``DataFrame`` which incorrectly raises an error (:issue:`20516`) - Bug in :func:`pandas.wide_to_long` when a string is passed to the stubnames argument and a column name is a substring of that stubname (:issue:`22468`) - Bug in :func:`merge` when merging ``datetime64[ns, tz]`` data that contained a DST transition (:issue:`18885`) diff --git a/doc/source/whatsnew/v0.6.0.rst b/doc/source/whatsnew/v0.6.0.rst index 8ff688eaa91e7..253ca4d4188e5 100644 --- a/doc/source/whatsnew/v0.6.0.rst +++ b/doc/source/whatsnew/v0.6.0.rst @@ -15,7 +15,7 @@ New features ~~~~~~~~~~~~ - :ref:`Added ` ``melt`` function to ``pandas.core.reshape`` - :ref:`Added ` ``level`` parameter to group by level in Series and DataFrame descriptive statistics (:issue:`313`) -- :ref:`Added ` ``head`` and ``tail`` methods to Series, analogous to to DataFrame (:issue:`296`) +- :ref:`Added ` ``head`` and ``tail`` methods to Series, analogous to DataFrame (:issue:`296`) - :ref:`Added ` ``Series.isin`` function which checks if each value is contained in a passed sequence (:issue:`289`) - :ref:`Added ` ``float_format`` option to ``Series.to_string`` - :ref:`Added ` ``skip_footer`` (:issue:`291`) and ``converters`` (:issue:`343`) options to ``read_csv`` and ``read_table`` diff --git a/doc/source/whatsnew/v0.8.0.rst b/doc/source/whatsnew/v0.8.0.rst index b34c2a5c6a07c..781054fc4de7c 100644 --- a/doc/source/whatsnew/v0.8.0.rst +++ b/doc/source/whatsnew/v0.8.0.rst @@ -81,7 +81,7 @@ Time Series changes and improvements timestamps are stored as UTC; Timestamps from DatetimeIndex objects with time zone set will be localized to local time. Time zone conversions are therefore essentially free. User needs to know very little about pytz library now; only - time zone names as as strings are required. Time zone-aware timestamps are + time zone names as strings are required. Time zone-aware timestamps are equal if and only if their UTC timestamps match. Operations between time zone-aware time series with different time zones will result in a UTC-indexed time series. diff --git a/pandas/_testing.py b/pandas/_testing.py index da2963e167767..68371b782aac2 100644 --- a/pandas/_testing.py +++ b/pandas/_testing.py @@ -1768,7 +1768,7 @@ def box_expected(expected, box_cls, transpose=True): elif box_cls is pd.DataFrame: expected = pd.Series(expected).to_frame() if transpose: - # for vector operations, we we need a DataFrame to be a single-row, + # for vector operations, we need a DataFrame to be a single-row, # not a single-column, in order to operate against non-DataFrame # vectors of the same length. expected = expected.T diff --git a/pandas/core/algorithms.py b/pandas/core/algorithms.py index fcd47b37268cc..ecc94dd58066e 100644 --- a/pandas/core/algorithms.py +++ b/pandas/core/algorithms.py @@ -446,7 +446,7 @@ def isin(comps: AnyArrayLike, values: AnyArrayLike) -> np.ndarray: # Albeit hashmap has O(1) look-up (vs. O(logn) in sorted array), # in1d is faster for small sizes if len(comps) > 1_000_000 and len(values) <= 26 and not is_object_dtype(comps): - # If the the values include nan we need to check for nan explicitly + # If the values include nan we need to check for nan explicitly # since np.nan it not equal to np.nan if isna(values).any(): f = lambda c, v: np.logical_or(np.in1d(c, v), np.isnan(c)) @@ -1551,7 +1551,7 @@ def take(arr, indices, axis: int = 0, allow_fill: bool = False, fill_value=None) * True: negative values in `indices` indicate missing values. These values are set to `fill_value`. Any other - other negative values raise a ``ValueError``. + negative values raise a ``ValueError``. fill_value : any, optional Fill value to use for NA-indices when `allow_fill` is True. diff --git a/pandas/core/arrays/categorical.py b/pandas/core/arrays/categorical.py index 62e508c491740..5b230c175eaef 100644 --- a/pandas/core/arrays/categorical.py +++ b/pandas/core/arrays/categorical.py @@ -77,7 +77,7 @@ def func(self, other): "Unordered Categoricals can only compare equality or not" ) if isinstance(other, Categorical): - # Two Categoricals can only be be compared if the categories are + # Two Categoricals can only be compared if the categories are # the same (maybe up to ordering, depending on ordered) msg = "Categoricals can only be compared if 'categories' are the same." diff --git a/pandas/core/arrays/floating.py b/pandas/core/arrays/floating.py index a5ebdd8d963e2..4aed39d7edb92 100644 --- a/pandas/core/arrays/floating.py +++ b/pandas/core/arrays/floating.py @@ -120,7 +120,7 @@ def coerce_to_array( ------- tuple of (values, mask) """ - # if values is floating numpy array, preserve it's dtype + # if values is floating numpy array, preserve its dtype if dtype is None and hasattr(values, "dtype"): if is_float_dtype(values.dtype): dtype = values.dtype diff --git a/pandas/core/arrays/integer.py b/pandas/core/arrays/integer.py index c9d7632e39228..2897c18acfb09 100644 --- a/pandas/core/arrays/integer.py +++ b/pandas/core/arrays/integer.py @@ -183,7 +183,7 @@ def coerce_to_array( ------- tuple of (values, mask) """ - # if values is integer numpy array, preserve it's dtype + # if values is integer numpy array, preserve its dtype if dtype is None and hasattr(values, "dtype"): if is_integer_dtype(values.dtype): dtype = values.dtype diff --git a/pandas/core/arrays/numpy_.py b/pandas/core/arrays/numpy_.py index 0cdce1eabccc6..4eb67dcd12728 100644 --- a/pandas/core/arrays/numpy_.py +++ b/pandas/core/arrays/numpy_.py @@ -144,7 +144,7 @@ class PandasArray( # If you're wondering why pd.Series(cls) doesn't put the array in an # ExtensionBlock, search for `ABCPandasArray`. We check for - # that _typ to ensure that that users don't unnecessarily use EAs inside + # that _typ to ensure that users don't unnecessarily use EAs inside # pandas internals, which turns off things like block consolidation. _typ = "npy_extension" __array_priority__ = 1000 diff --git a/pandas/core/dtypes/base.py b/pandas/core/dtypes/base.py index 8630867c64f88..c2be81cd46b3b 100644 --- a/pandas/core/dtypes/base.py +++ b/pandas/core/dtypes/base.py @@ -99,9 +99,8 @@ def __eq__(self, other: Any) -> bool: By default, 'other' is considered equal if either * it's a string matching 'self.name'. - * it's an instance of this type and all of the - the attributes in ``self._metadata`` are equal between - `self` and `other`. + * it's an instance of this type and all of the attributes + in ``self._metadata`` are equal between `self` and `other`. Parameters ---------- diff --git a/pandas/core/dtypes/cast.py b/pandas/core/dtypes/cast.py index 465ec821400e7..0f0e82f4ad4e2 100644 --- a/pandas/core/dtypes/cast.py +++ b/pandas/core/dtypes/cast.py @@ -391,7 +391,7 @@ def maybe_cast_to_extension_array( assertion_msg = f"must pass a subclass of ExtensionArray: {cls}" assert issubclass(cls, ABCExtensionArray), assertion_msg - # Everything can be be converted to StringArrays, but we may not want to convert + # Everything can be converted to StringArrays, but we may not want to convert if ( issubclass(cls, (StringArray, ArrowStringArray)) and lib.infer_dtype(obj) != "string" @@ -1200,7 +1200,7 @@ def soft_convert_objects( elif conversion_count > 1 and coerce: raise ValueError( "Only one of 'datetime', 'numeric' or " - "'timedelta' can be True when when coerce=True." + "'timedelta' can be True when coerce=True." ) if not is_object_dtype(values.dtype): diff --git a/pandas/core/dtypes/common.py b/pandas/core/dtypes/common.py index 14184f044ae95..b4f6d587c6642 100644 --- a/pandas/core/dtypes/common.py +++ b/pandas/core/dtypes/common.py @@ -1727,7 +1727,7 @@ def _validate_date_like_dtype(dtype) -> None: ------ TypeError : The dtype could not be casted to a date-like dtype. ValueError : The dtype is an illegal date-like dtype (e.g. the - the frequency provided is too specific) + frequency provided is too specific) """ try: typ = np.datetime_data(dtype)[0] diff --git a/pandas/core/dtypes/dtypes.py b/pandas/core/dtypes/dtypes.py index 01b34187997cb..07280702cf06f 100644 --- a/pandas/core/dtypes/dtypes.py +++ b/pandas/core/dtypes/dtypes.py @@ -47,7 +47,7 @@ class PandasExtensionDtype(ExtensionDtype): type: Any kind: Any # The Any type annotations above are here only because mypy seems to have a - # problem dealing with with multiple inheritance from PandasExtensionDtype + # problem dealing with multiple inheritance from PandasExtensionDtype # and ExtensionDtype's @properties in the subclasses below. The kind and # type variables in those subclasses are explicitly typed below. subdtype = None diff --git a/pandas/core/frame.py b/pandas/core/frame.py index 6f6d94f0e9f8e..b14a80beb9f8c 100644 --- a/pandas/core/frame.py +++ b/pandas/core/frame.py @@ -6473,7 +6473,7 @@ def update( 1 b e 2 c f - For Series, it's name attribute must be set. + For Series, its name attribute must be set. >>> df = pd.DataFrame({'A': ['a', 'b', 'c'], ... 'B': ['x', 'y', 'z']}) diff --git a/pandas/core/generic.py b/pandas/core/generic.py index 7e8012d76fe1b..8f057a98eed2d 100644 --- a/pandas/core/generic.py +++ b/pandas/core/generic.py @@ -1114,7 +1114,7 @@ def rename_axis(self, mapper=lib.no_default, **kwargs): In this case, the parameter ``copy`` is ignored. The second calling convention will modify the names of the - the corresponding index if mapper is a list or a scalar. + corresponding index if mapper is a list or a scalar. However, if mapper is dict-like or a function, it will use the deprecated behavior of modifying the axis *labels*. @@ -2717,7 +2717,7 @@ def to_sql( >>> engine.execute("SELECT * FROM users").fetchall() [(0, 'User 1'), (1, 'User 2'), (2, 'User 3')] - An `sqlalchemy.engine.Connection` can also be passed to to `con`: + An `sqlalchemy.engine.Connection` can also be passed to `con`: >>> with engine.begin() as connection: ... df1 = pd.DataFrame({'name' : ['User 4', 'User 5']}) @@ -5483,7 +5483,7 @@ def __setattr__(self, name: str, value) -> None: def _dir_additions(self) -> Set[str]: """ add the string-like attributes from the info_axis. - If info_axis is a MultiIndex, it's first level values are used. + If info_axis is a MultiIndex, its first level values are used. """ additions = super()._dir_additions() if self._info_axis._can_hold_strings: diff --git a/pandas/core/groupby/generic.py b/pandas/core/groupby/generic.py index 3395b9d36fd0c..244c47cd1f1ea 100644 --- a/pandas/core/groupby/generic.py +++ b/pandas/core/groupby/generic.py @@ -262,7 +262,7 @@ def aggregate(self, func=None, *args, engine=None, engine_kwargs=None, **kwargs) return self._python_agg_general(func, *args, **kwargs) except (ValueError, KeyError): # TODO: KeyError is raised in _python_agg_general, - # see see test_groupby.test_basic + # see test_groupby.test_basic result = self._aggregate_named(func, *args, **kwargs) index = Index(sorted(result), name=self.grouper.names[0]) @@ -1390,8 +1390,7 @@ def _transform_fast(self, result: DataFrame) -> DataFrame: """ obj = self._obj_with_exclusions - # for each col, reshape to to size of original frame - # by take operation + # for each col, reshape to size of original frame by take operation ids, _, ngroup = self.grouper.group_info result = result.reindex(self.grouper.result_index, copy=False) output = [ diff --git a/pandas/core/groupby/ops.py b/pandas/core/groupby/ops.py index fc80852f00c95..e9aab79d810e6 100644 --- a/pandas/core/groupby/ops.py +++ b/pandas/core/groupby/ops.py @@ -148,7 +148,7 @@ def _get_splitter(self, data: FrameOrSeries, axis: int = 0) -> "DataSplitter": ------- Generator yielding subsetted objects - __finalize__ has not been called for the the subsetted objects returned. + __finalize__ has not been called for the subsetted objects returned. """ comp_ids, _, ngroups = self.group_info return get_splitter(data, comp_ids, ngroups, axis=axis) diff --git a/pandas/core/indexes/base.py b/pandas/core/indexes/base.py index b5900ead246f3..c4f6dac1915ec 100644 --- a/pandas/core/indexes/base.py +++ b/pandas/core/indexes/base.py @@ -1486,7 +1486,7 @@ def _get_level_number(self, level) -> int: def sortlevel(self, level=None, ascending=True, sort_remaining=None): """ - For internal compatibility with with the Index API. + For internal compatibility with the Index API. Sort the Index. This is for compat with MultiIndex @@ -4451,7 +4451,7 @@ def equals(self, other: object) -> bool: if not isinstance(other, Index): return False - # If other is a subclass of self and defines it's own equals method, we + # If other is a subclass of self and defines its own equals method, we # dispatch to the subclass method. For instance for a MultiIndex, # a d-level MultiIndex can equal d-tuple Index. # Note: All EA-backed Index subclasses override equals diff --git a/pandas/core/indexes/interval.py b/pandas/core/indexes/interval.py index 98752a21e44a2..8e7d429ce426d 100644 --- a/pandas/core/indexes/interval.py +++ b/pandas/core/indexes/interval.py @@ -479,7 +479,7 @@ def _needs_i8_conversion(self, key) -> bool: """ Check if a given key needs i8 conversion. Conversion is necessary for Timestamp, Timedelta, DatetimeIndex, and TimedeltaIndex keys. An - Interval-like requires conversion if it's endpoints are one of the + Interval-like requires conversion if its endpoints are one of the aforementioned types. Assumes that any list-like data has already been cast to an Index. @@ -501,7 +501,7 @@ def _needs_i8_conversion(self, key) -> bool: def _maybe_convert_i8(self, key): """ - Maybe convert a given key to it's equivalent i8 value(s). Used as a + Maybe convert a given key to its equivalent i8 value(s). Used as a preprocessing step prior to IntervalTree queries (self._engine), which expects numeric data. @@ -540,7 +540,7 @@ def _maybe_convert_i8(self, key): # DatetimeIndex/TimedeltaIndex key_dtype, key_i8 = key.dtype, Index(key.asi8) if key.hasnans: - # convert NaT from it's i8 value to np.nan so it's not viewed + # convert NaT from its i8 value to np.nan so it's not viewed # as a valid value, maybe causing errors (e.g. is_overlapping) key_i8 = key_i8.where(~key._isnan) diff --git a/pandas/core/indexes/multi.py b/pandas/core/indexes/multi.py index 11dd3598b4864..aef8855df6b03 100644 --- a/pandas/core/indexes/multi.py +++ b/pandas/core/indexes/multi.py @@ -2311,7 +2311,7 @@ def reorder_levels(self, order): def _get_codes_for_sorting(self): """ - we categorizing our codes by using the + we are categorizing our codes by using the available categories (all, not just observed) excluding any missing ones (-1); this is in preparation for sorting, where we need to disambiguate that -1 is not diff --git a/pandas/core/indexing.py b/pandas/core/indexing.py index f6d14a1c1503c..080c307ac895f 100644 --- a/pandas/core/indexing.py +++ b/pandas/core/indexing.py @@ -1021,7 +1021,7 @@ def _multi_take(self, tup: Tuple): def _getitem_iterable(self, key, axis: int): """ - Index current object with an an iterable collection of keys. + Index current object with an iterable collection of keys. Parameters ---------- diff --git a/pandas/core/nanops.py b/pandas/core/nanops.py index d38974839394d..80c4cd5b44a92 100644 --- a/pandas/core/nanops.py +++ b/pandas/core/nanops.py @@ -1646,7 +1646,7 @@ def nanpercentile( interpolation=interpolation, ) - # Note: we have to do do `astype` and not view because in general we + # Note: we have to do `astype` and not view because in general we # have float result at this point, not i8 return result.astype(values.dtype) diff --git a/pandas/io/excel/_odswriter.py b/pandas/io/excel/_odswriter.py index f9a08bf862644..0bea19bec2cdd 100644 --- a/pandas/io/excel/_odswriter.py +++ b/pandas/io/excel/_odswriter.py @@ -182,7 +182,7 @@ def _process_style(self, style: Dict[str, Any]) -> str: Returns ------- style_key : str - Unique style key for for later reference in sheet + Unique style key for later reference in sheet """ from odf.style import ( ParagraphProperties, diff --git a/pandas/io/formats/console.py b/pandas/io/formats/console.py index ab9c9fe995008..ea291bcbfa44c 100644 --- a/pandas/io/formats/console.py +++ b/pandas/io/formats/console.py @@ -78,7 +78,7 @@ def check_main(): def in_ipython_frontend(): """ - Check if we're inside an an IPython zmq frontend. + Check if we're inside an IPython zmq frontend. Returns ------- diff --git a/pandas/io/formats/csvs.py b/pandas/io/formats/csvs.py index cbe2ed1ed838d..fbda78a1842ca 100644 --- a/pandas/io/formats/csvs.py +++ b/pandas/io/formats/csvs.py @@ -144,7 +144,7 @@ def _initialize_columns(self, cols: Optional[Sequence[Label]]) -> Sequence[Label self.obj = self.obj.loc[:, cols] # update columns to include possible multiplicity of dupes - # and make sure sure cols is just a list of labels + # and make sure cols is just a list of labels new_cols = self.obj.columns if isinstance(new_cols, ABCIndexClass): return new_cols._format_native_types(**self._number_format) diff --git a/pandas/io/formats/printing.py b/pandas/io/formats/printing.py index 72b07000146b2..ac453839792f3 100644 --- a/pandas/io/formats/printing.py +++ b/pandas/io/formats/printing.py @@ -308,7 +308,7 @@ def format_object_summary( name : name, optional defaults to the class name of the obj indent_for_name : bool, default True - Whether subsequent lines should be be indented to + Whether subsequent lines should be indented to align with the name. line_break_each_value : bool, default False If True, inserts a line break for each value of ``obj``. diff --git a/pandas/io/formats/style.py b/pandas/io/formats/style.py index f80c5317598e7..0eeff44d0f74c 100644 --- a/pandas/io/formats/style.py +++ b/pandas/io/formats/style.py @@ -903,7 +903,7 @@ def set_table_attributes(self, attributes: str) -> "Styler": Set the table attributes. These are the items that show up in the opening ```` tag - in addition to to automatic (by default) id. + in addition to automatic (by default) id. Parameters ---------- diff --git a/pandas/io/sql.py b/pandas/io/sql.py index 51888e5021d80..1fea50ecade3c 100644 --- a/pandas/io/sql.py +++ b/pandas/io/sql.py @@ -212,7 +212,7 @@ def read_sql_table( table_name : str Name of SQL table in database. con : SQLAlchemy connectable or str - A database URI could be provided as as str. + A database URI could be provided as str. SQLite DBAPI connection mode not supported. schema : str, default None Name of SQL schema in database to query (if database flavor diff --git a/pandas/tests/dtypes/test_inference.py b/pandas/tests/dtypes/test_inference.py index 014094923185f..27fac95a16b7a 100644 --- a/pandas/tests/dtypes/test_inference.py +++ b/pandas/tests/dtypes/test_inference.py @@ -126,7 +126,7 @@ def test_is_list_like_disallow_sets(maybe_list_like): def test_is_list_like_recursion(): # GH 33721 - # interpreter would crash with with SIGABRT + # interpreter would crash with SIGABRT def foo(): inference.is_list_like([]) foo() diff --git a/pandas/tests/frame/methods/test_describe.py b/pandas/tests/frame/methods/test_describe.py index f77b7cd4a6c3b..b7692eee16bf8 100644 --- a/pandas/tests/frame/methods/test_describe.py +++ b/pandas/tests/frame/methods/test_describe.py @@ -117,7 +117,7 @@ def test_describe_categorical(self): def test_describe_empty_categorical_column(self): # GH#26397 - # Ensure the index of an an empty categorical DataFrame column + # Ensure the index of an empty categorical DataFrame column # also contains (count, unique, top, freq) df = DataFrame({"empty_col": Categorical([])}) result = df.describe() diff --git a/pandas/tests/groupby/test_groupby.py b/pandas/tests/groupby/test_groupby.py index da556523a3341..78c438fa11a0e 100644 --- a/pandas/tests/groupby/test_groupby.py +++ b/pandas/tests/groupby/test_groupby.py @@ -1615,7 +1615,7 @@ def test_groupby_multiindex_not_lexsorted(): def test_index_label_overlaps_location(): # checking we don't have any label/location confusion in the - # the wake of GH5375 + # wake of GH5375 df = DataFrame(list("ABCDE"), index=[2, 0, 2, 1, 1]) g = df.groupby(list("ababb")) actual = g.filter(lambda x: len(x) > 2) diff --git a/pandas/tests/indexes/conftest.py b/pandas/tests/indexes/conftest.py index fb17e1df6341b..ac4477e60d5dc 100644 --- a/pandas/tests/indexes/conftest.py +++ b/pandas/tests/indexes/conftest.py @@ -13,7 +13,7 @@ def sort(request): parameters [True, False]. We can't combine them as sort=True is not permitted - in in the Index setops methods. + in the Index setops methods. """ return request.param diff --git a/pandas/tests/indexing/test_partial.py b/pandas/tests/indexing/test_partial.py index 3bf37f4cade8b..353dfcf37c28d 100644 --- a/pandas/tests/indexing/test_partial.py +++ b/pandas/tests/indexing/test_partial.py @@ -208,7 +208,7 @@ def test_series_partial_set(self): result = ser.reindex([2, 2, "x", 1]) tm.assert_series_equal(result, expected, check_index_type=True) - # raises as nothing in in the index + # raises as nothing is in the index msg = ( r"\"None of \[Int64Index\(\[3, 3, 3\], dtype='int64'\)\] are " r"in the \[index\]\"" @@ -289,7 +289,7 @@ def test_series_partial_set_with_name(self): with pytest.raises(KeyError, match="with any missing labels"): ser.loc[[2, 2, "x", 1]] - # raises as nothing in in the index + # raises as nothing is in the index msg = ( r"\"None of \[Int64Index\(\[3, 3, 3\], dtype='int64', " r"name='idx'\)\] are in the \[index\]\"" diff --git a/versioneer.py b/versioneer.py index 288464f1efa44..e7fed874ae20f 100644 --- a/versioneer.py +++ b/versioneer.py @@ -1541,7 +1541,7 @@ def get_cmdclass(cmdclass=None): # of Versioneer. A's setup.py imports A's Versioneer, leaving it in # sys.modules by the time B's setup.py is executed, causing B to run # with the wrong versioneer. Setuptools wraps the sub-dep builds in a - # sandbox that restores sys.modules to it's pre-build state, so the + # sandbox that restores sys.modules to its pre-build state, so the # parent is protected against the child's "import versioneer". By # removing ourselves from sys.modules here, before the child build # happens, we protect the child from the parent's versioneer too. diff --git a/web/pandas/community/ecosystem.md b/web/pandas/community/ecosystem.md index 515d23afb93ec..7cf78958370ac 100644 --- a/web/pandas/community/ecosystem.md +++ b/web/pandas/community/ecosystem.md @@ -6,7 +6,7 @@ encouraging because it means pandas is not only helping users to handle their data tasks but also that it provides a better starting point for developers to build powerful and more focused data tools. The creation of libraries that complement pandas' functionality also allows pandas -development to remain focused around it's original requirements. +development to remain focused around its original requirements. This is an inexhaustive list of projects that build on pandas in order to provide tools in the PyData space. For a list of projects that depend