Skip to content

API: deprecate setting of .ordered directly (GH9347, GH9190) #9622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 11, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -585,6 +585,8 @@ following usable methods and properties (all available as ``Series.cat.<method_o
Categorical.remove_categories
Categorical.remove_unused_categories
Categorical.set_categories
Categorical.as_ordered
Categorical.as_unordered
Categorical.codes

To create a Series of dtype ``category``, use ``cat = s.astype("category")``.
Expand Down
48 changes: 29 additions & 19 deletions doc/source/categorical.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,8 +90,6 @@ By using some special functions:
See :ref:`documentation <reshaping.tile.cut>` for :func:`~pandas.cut`.

By passing a :class:`pandas.Categorical` object to a `Series` or assigning it to a `DataFrame`.
This is the only possibility to specify differently ordered categories (or no order at all) at
creation time and the only reason to use :class:`pandas.Categorical` directly:

.. ipython:: python

Expand All @@ -103,6 +101,14 @@ creation time and the only reason to use :class:`pandas.Categorical` directly:
df["B"] = raw_cat
df

You can also specify differently ordered categories or make the resulting data ordered, by passing these arguments to ``astype()``:

.. ipython:: python

s = Series(["a","b","c","a"])
s_cat = s.astype("category", categories=["b","c","d"], ordered=False)
s_cat

Categorical data has a specific ``category`` :ref:`dtype <basics.dtypes>`:

.. ipython:: python
Expand Down Expand Up @@ -176,10 +182,9 @@ It's also possible to pass in the categories in a specific order:
s.cat.ordered

.. note::
New categorical data is automatically ordered if the passed in values are sortable or a
`categories` argument is supplied. This is a difference to R's `factors`, which are unordered
unless explicitly told to be ordered (``ordered=TRUE``). You can of course overwrite that by
passing in an explicit ``ordered=False``.

New categorical data are NOT automatically ordered. You must explicity pass ``ordered=True`` to
indicate an ordered ``Categorical``.


Renaming categories
Expand Down Expand Up @@ -270,29 +275,37 @@ Sorting and Order

.. _categorical.sort:

.. warning::

The default for construction has change in v0.16.0 to ``ordered=False``, from the prior implicit ``ordered=True``

If categorical data is ordered (``s.cat.ordered == True``), then the order of the categories has a
meaning and certain operations are possible. If the categorical is unordered, a `TypeError` is
raised.
meaning and certain operations are possible. If the categorical is unordered, ``.min()/.max()`` will raise a `TypeError`.

.. ipython:: python

s = Series(Categorical(["a","b","c","a"], ordered=False))
try:
s.sort()
except TypeError as e:
print("TypeError: " + str(e))
s = Series(["a","b","c","a"], dtype="category") # ordered per default!
s.sort()
s = Series(["a","b","c","a"]).astype('category', ordered=True)
s.sort()
s
s.min(), s.max()

You can set categorical data to be ordered by using ``as_ordered()`` or unordered by using ``as_unordered()``. These will by
default return a *new* object.

.. ipython:: python

s.cat.as_ordered()
s.cat.as_unordered()

Sorting will use the order defined by categories, not any lexical order present on the data type.
This is even true for strings and numeric data:

.. ipython:: python

s = Series([1,2,3,1], dtype="category")
s.cat.categories = [2,3,1]
s = s.cat.set_categories([2,3,1], ordered=True)
s
s.sort()
s
Expand All @@ -310,7 +323,7 @@ necessarily make the sort order the same as the categories order.
.. ipython:: python

s = Series([1,2,3,1], dtype="category")
s = s.cat.reorder_categories([2,3,1])
s = s.cat.reorder_categories([2,3,1], ordered=True)
s
s.sort()
s
Expand Down Expand Up @@ -339,7 +352,7 @@ The ordering of the categorical is determined by the ``categories`` of that colu

.. ipython:: python

dfs = DataFrame({'A' : Categorical(list('bbeebbaa'),categories=['e','a','b']),
dfs = DataFrame({'A' : Categorical(list('bbeebbaa'),categories=['e','a','b'],ordered=True),
'B' : [1,2,1,2,2,1,2,1] })
dfs.sort(['A','B'])

Expand Down Expand Up @@ -664,9 +677,6 @@ The following differences to R's factor functions can be observed:

* R's `levels` are named `categories`
* R's `levels` are always of type string, while `categories` in pandas can be of any dtype.
* New categorical data is automatically ordered if the passed in values are sortable or a
`categories` argument is supplied. This is a difference to R's `factors`, which are unordered
unless explicitly told to be ordered (``ordered=TRUE``).
* It's not possible to specify labels at creation time. Use ``s.cat.rename_categories(new_labels)``
afterwards.
* In contrast to R's `factor` function, using categorical data as the sole input to create a
Expand Down
1 change: 1 addition & 0 deletions doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ Highlights include:
- ``Series.to_coo/from_coo`` methods to interact with ``scipy.sparse``, see :ref:`here <whatsnew_0160.enhancements.sparse>`
- Backwards incompatible change to ``Timedelta`` to conform the ``.seconds`` attribute with ``datetime.timedelta``, see :ref:`here <whatsnew_0160.api_breaking.timedelta>`
- Changes to the ``.loc`` slicing API to conform with the behavior of ``.ix`` see :ref:`here <whatsnew_0160.api_breaking.indexing>`
- Changes to the default for ordering in the ``Categorical`` constructor, see :ref:`here <whatsnew_0160.api_breaking.categorical>`

See the :ref:`v0.16.0 Whatsnew <whatsnew_0160>` overview or the issue tracker on GitHub for an extensive list
of all API changes, enhancements and bugs that have been fixed in 0.16.0.
Expand Down
129 changes: 129 additions & 0 deletions doc/source/whatsnew/v0.16.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ users upgrade to this version.
* ``Series.to_coo/from_coo`` methods to interact with ``scipy.sparse``, see :ref:`here <whatsnew_0160.enhancements.sparse>`
* Backwards incompatible change to ``Timedelta`` to conform the ``.seconds`` attribute with ``datetime.timedelta``, see :ref:`here <whatsnew_0160.api_breaking.timedelta>`
* Changes to the ``.loc`` slicing API to conform with the behavior of ``.ix`` see :ref:`here <whatsnew_0160.api_breaking.indexing>`
* Changes to the default for ordering in the ``Categorical`` constructor, see :ref:`here <whatsnew_0160.api_breaking.categorical>`

- Check the :ref:`API Changes <whatsnew_0160.api>` and :ref:`deprecations <whatsnew_0160.deprecations>` before updating

Expand Down Expand Up @@ -366,6 +367,134 @@ API Changes
- ``Series.describe`` for categorical data will now give counts and frequencies of 0, not ``NaN``, for unused categories (:issue:`9443`)


Categorical Changes
~~~~~~~~~~~~~~~~~~~

.. _whatsnew_0160.api_breaking.categorical:

In prior versions, ``Categoricals`` that had an unspecified ordering (meaning no ``ordered`` keyword was passed) were defaulted as ``ordered`` Categoricals. Going forward, the ``ordered`` keyword in the ``Categorical`` constructor will default to ``False``. Ordering must now be explicit.

Furthermore, previously you *could* change the ``ordered`` attribute of a Categorical by just setting the attribute, e.g. ``cat.ordered=True``; This is now deprecated and you should use ``cat.as_ordered()`` or ``cat.as_unordered()``. These will by default return a **new** object and not modify the existing object. (:issue:`9347`, :issue:`9190`)

Previous Behavior

.. code-block:: python

In [3]: s = Series([0,1,2], dtype='category')

In [4]: s
Out[4]:
0 0
1 1
2 2
dtype: category
Categories (3, int64): [0 < 1 < 2]

In [5]: s.cat.ordered
Out[5]: True

In [6]: s.cat.ordered = False

In [7]: s
Out[7]:
0 0
1 1
2 2
dtype: category
Categories (3, int64): [0, 1, 2]

New Behavior

.. ipython:: python

s = Series([0,1,2], dtype='category')
s
s.cat.ordered
s = s.cat.as_ordered()
s
s.cat.ordered

# you can set in the constructor of the Categorical
s = Series(Categorical([0,1,2],ordered=True))
s
s.cat.ordered

For ease of creation of series of categorical data, we have added the ability to pass keywords when calling ``.astype()``. These are passed directly to the constructor.

.. ipython:: python

s = Series(["a","b","c","a"]).astype('category',ordered=True)
s
s = Series(["a","b","c","a"]).astype('category',categories=list('abcdef'),ordered=False)
s

Indexing Changes
~~~~~~~~~~~~~~~~

.. _whatsnew_0160.api_breaking.indexing:

The behavior of a small sub-set of edge cases for using ``.loc`` have changed (:issue:`8613`). Furthermore we have improved the content of the error messages that are raised:

- slicing with ``.loc`` where the start and/or stop bound is not found in the index is now allowed; this previously would raise a ``KeyError``. This makes the behavior the same as ``.ix`` in this case. This change is only for slicing, not when indexing with a single label.

.. ipython:: python

df = DataFrame(np.random.randn(5,4),
columns=list('ABCD'),
index=date_range('20130101',periods=5))
df
s = Series(range(5),[-2,-1,1,2,3])
s

Previous Behavior

.. code-block:: python

In [4]: df.loc['2013-01-02':'2013-01-10']
KeyError: 'stop bound [2013-01-10] is not in the [index]'

In [6]: s.loc[-10:3]
KeyError: 'start bound [-10] is not the [index]'

New Behavior

.. ipython:: python

df.loc['2013-01-02':'2013-01-10']
s.loc[-10:3]

- allow slicing with float-like values on an integer index for ``.ix``. Previously this was only enabled for ``.loc``:

Previous Behavior

.. code-block:: python

In [8]: s.ix[-1.0:2]
TypeError: the slice start value [-1.0] is not a proper indexer for this index type (Int64Index)

New Behavior

.. ipython:: python

s.ix[-1.0:2]

- provide a useful exception for indexing with an invalid type for that index when using ``.loc``. For example trying to use ``.loc`` on an index of type ``DatetimeIndex`` or ``PeriodIndex`` or ``TimedeltaIndex``, with an integer (or a float).

Previous Behavior

.. code-block:: python

In [4]: df.loc[2:3]
KeyError: 'start bound [2] is not the [index]'

New Behavior

.. code-block:: python

In [4]: df.loc[2:3]
TypeError: Cannot do slice indexing on <class 'pandas.tseries.index.DatetimeIndex'> with <type 'int'> keys


.. _whatsnew_0160.deprecations:

Deprecations
Expand Down
Loading