Skip to content

ENH: Implement Kleene logic for BooleanArray #29842

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
bb904cb
ENH: add BooleanArray extension array (#29555)
jorisvandenbossche Nov 25, 2019
13c7ea3
move
TomAugspurger Nov 26, 2019
fff786f
doc fixup
TomAugspurger Nov 26, 2019
4067e7f
Merge remote-tracking branch 'upstream/master' into boolean-array-kleene
TomAugspurger Nov 26, 2019
708c553
working
TomAugspurger Nov 26, 2019
c56894e
Merge remote-tracking branch 'upstream/master' into boolean-array-kleene
TomAugspurger Nov 27, 2019
2e9d547
updates
TomAugspurger Nov 27, 2019
373aaab
updates
TomAugspurger Nov 27, 2019
7f78a64
Raise for NaN
TomAugspurger Nov 27, 2019
36b171b
added tests for empty
TomAugspurger Nov 27, 2019
747e046
added tests for inplace mutation
TomAugspurger Nov 27, 2019
d0a8cca
Do not assume masked values are False
TomAugspurger Nov 27, 2019
fe061b0
Merge remote-tracking branch 'upstream/master' into boolean-array-kleene
TomAugspurger Nov 27, 2019
9f9e44c
mypy
TomAugspurger Nov 27, 2019
0a34257
doc fixups
TomAugspurger Nov 27, 2019
2ba0034
Added benchmarks
TomAugspurger Nov 27, 2019
2d1129a
update tests
TomAugspurger Nov 27, 2019
a24fc22
Merge remote-tracking branch 'upstream/master' into boolean-array-kleene
TomAugspurger Nov 27, 2019
77dd1fc
remove unneded setitem
TomAugspurger Nov 27, 2019
7b9002c
optimize
TomAugspurger Nov 27, 2019
c18046b
comments
TomAugspurger Nov 27, 2019
1237caa
just do the xor
TomAugspurger Nov 27, 2019
2ecf9b8
Merge remote-tracking branch 'upstream/master' into boolean-array-kleene
TomAugspurger Dec 2, 2019
87aeb09
fixup docstring
TomAugspurger Dec 2, 2019
969b6dc
fix label
TomAugspurger Dec 2, 2019
1c9ba49
PERF: faster or
TomAugspurger Dec 2, 2019
8eec954
Merge remote-tracking branch 'upstream/master' into boolean-array-kleene
TomAugspurger Dec 4, 2019
cb47b6a
handle pd.NA
TomAugspurger Dec 4, 2019
2a946b9
validate
TomAugspurger Dec 4, 2019
efb6f8b
please mypy
TomAugspurger Dec 4, 2019
004238e
move to nanops
TomAugspurger Dec 4, 2019
5a2c81c
Merge remote-tracking branch 'upstream/master' into boolean-array-kleene
TomAugspurger Dec 5, 2019
7032318
move
TomAugspurger Dec 5, 2019
bbb7f9b
numpy scalars
TomAugspurger Dec 5, 2019
ce763b4
doc note
TomAugspurger Dec 5, 2019
5bc5328
handle numpy bool
TomAugspurger Dec 5, 2019
457bd08
Merge remote-tracking branch 'upstream/master' into boolean-array-kleene
TomAugspurger Dec 6, 2019
31c2bc6
cleanup
TomAugspurger Dec 6, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions asv_bench/benchmarks/boolean.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
import numpy as np

import pandas as pd


class TimeLogicalOps:
def setup(self):
N = 10_000
left, right, lmask, rmask = np.random.randint(0, 2, size=(4, N)).astype("bool")
self.left = pd.arrays.BooleanArray(left, lmask)
self.right = pd.arrays.BooleanArray(right, rmask)

def time_or_scalar(self):
self.left | True
self.left | False

def time_or_array(self):
self.left | self.right

def time_and_scalar(self):
self.left & True
self.left & False

def time_and_array(self):
self.left & self.right

def time_xor_scalar(self):
self.left ^ True
self.left ^ False

def time_xor_array(self):
self.left ^ self.right
1 change: 1 addition & 0 deletions doc/source/index.rst.template
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ See the :ref:`overview` for more detail about what's in the library.
* :doc:`user_guide/missing_data`
* :doc:`user_guide/categorical`
* :doc:`user_guide/integer_na`
* :doc:`user_guide/boolean`
* :doc:`user_guide/visualization`
* :doc:`user_guide/computation`
* :doc:`user_guide/groupby`
Expand Down
79 changes: 79 additions & 0 deletions doc/source/user_guide/boolean.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
.. currentmodule:: pandas

.. ipython:: python
:suppress:

import pandas as pd
import numpy as np

.. _boolean:

**************************
Nullable Boolean Data Type
**************************

.. versionadded:: 1.0.0

.. _boolean.kleene:

Kleene Logical Operations
-------------------------

:class:`arrays.BooleanArray` implements `Kleene Logic`_ (sometimes called three-value logic) for
logical operations like ``&`` (and), ``|`` (or) and ``^`` (exclusive-or).

This table demonstrates the results for every combination. These operations are symmetrical,
so flipping the left- and right-hand side makes no difference in the result.

================= =========
Expression Result
================= =========
``True & True`` ``True``
``True & False`` ``False``
``True & NA`` ``NA``
``False & False`` ``False``
``False & NA`` ``False``
``NA & NA`` ``NA``
``True | True`` ``True``
``True | False`` ``True``
``True | NA`` ``True``
``False | False`` ``False``
``False | NA`` ``NA``
``NA | NA`` ``NA``
``True ^ True`` ``False``
``True ^ False`` ``True``
``True ^ NA`` ``NA``
``False ^ False`` ``False``
``False ^ NA`` ``NA``
``NA ^ NA`` ``NA``
================= =========

When an ``NA`` is present in an operation, the output value is ``NA`` only if
the result cannot be determined solely based on the other input. For example,
``True | NA`` is ``True``, because both ``True | True`` and ``True | False``
are ``True``. In that case, we don't actually need to consider the value
of the ``NA``.

On the other hand, ``True & NA`` is ``NA``. The result depends on whether
the ``NA`` really is ``True`` or ``False``, since ``True & True`` is ``True``,
but ``True & False`` is ``False``, so we can't determine the output.


This differs from how ``np.nan`` behaves in logical operations. Pandas treated
``np.nan`` is *always false in the output*.

In ``or``

.. ipython:: python

pd.Series([True, False, np.nan], dtype="object") | True
pd.Series([True, False, np.nan], dtype="boolean") | True

In ``and``

.. ipython:: python

pd.Series([True, False, np.nan], dtype="object") & True
pd.Series([True, False, np.nan], dtype="boolean") & True

.. _Kleene Logic: https://en.wikipedia.org/wiki/Three-valued_logic#Kleene_and_Priest_logics
1 change: 1 addition & 0 deletions doc/source/user_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ Further information on any specific method can be obtained in the
missing_data
categorical
integer_na
boolean
visualization
computation
groupby
Expand Down
40 changes: 25 additions & 15 deletions pandas/core/arrays/boolean.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,9 @@ class BooleanArray(ExtensionArray, ExtensionOpsMixin):
represented by 2 numpy arrays: a boolean array with the data and
a boolean array with the mask (True indicating missing).

BooleanArray implements Kleene logic (sometimes called three-value
logic) for logical operations. See :ref:`boolean.kleene` for more.

To construct an BooleanArray from generic array-like input, use
:func:`pandas.array` specifying ``dtype="boolean"`` (see examples
below).
Expand Down Expand Up @@ -283,7 +286,7 @@ def __getitem__(self, item):

def _coerce_to_ndarray(self, dtype=None, na_value: "Scalar" = libmissing.NA):
"""
Coerce to an ndarary of object dtype or bool dtype (if force_bool=True).
Coerce to an ndarray of object dtype or bool dtype (if force_bool=True).

Parameters
----------
Expand Down Expand Up @@ -565,33 +568,40 @@ def logical_method(self, other):
# Rely on pandas to unbox and dispatch to us.
return NotImplemented

assert op.__name__ in {"or_", "ror_", "and_", "rand_", "xor", "rxor"}
other = lib.item_from_zerodim(other)
other_is_booleanarray = isinstance(other, BooleanArray)
other_is_scalar = lib.is_scalar(other)
mask = None

if isinstance(other, BooleanArray):
if other_is_booleanarray:
other, mask = other._data, other._mask
elif is_list_like(other):
other = np.asarray(other, dtype="bool")
if other.ndim > 1:
raise NotImplementedError(
"can only perform ops with 1-d structures"
)
if len(self) != len(other):
raise ValueError("Lengths must match to compare")
other, mask = coerce_to_array(other, copy=False)
elif isinstance(other, np.bool_):
other = other.item()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to convert to a python bool? why not just bool(other)? item i usually think of as being an array method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

item is the general method to get a python scalar (here we of course know we want a bool).

But Tom, why is it exactly needed to convert this? I would think the numpy operations later on work fine with a numpy scalar as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, we do things like if right is False or if right is True, which will fail for numpy booleans. I don't want to have to worry about checking both, so easier to convert here.


if other_is_scalar and not (other is libmissing.NA or lib.is_bool(other)):
raise TypeError(
"'other' should be pandas.NA or a bool. Got {} instead.".format(
type(other).__name__
)
)

# numpy will show a DeprecationWarning on invalid elementwise
# comparisons, this will raise in the future
with warnings.catch_warnings():
warnings.filterwarnings("ignore", "elementwise", FutureWarning)
with np.errstate(all="ignore"):
result = op(self._data, other)
if not other_is_scalar and len(self) != len(other):
raise ValueError("Lengths must match to compare")

# nans propagate
if mask is None:
mask = self._mask
else:
mask = self._mask | mask
if op.__name__ in {"or_", "ror_"}:
result, mask = ops.kleene_or(self._data, other, self._mask, mask)
elif op.__name__ in {"and_", "rand_"}:
result, mask = ops.kleene_and(self._data, other, self._mask, mask)
elif op.__name__ in {"xor", "rxor"}:
result, mask = ops.kleene_xor(self._data, other, self._mask, mask)

return BooleanArray(result, mask)

Expand Down
1 change: 1 addition & 0 deletions pandas/core/ops/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
_op_descriptions,
)
from pandas.core.ops.invalid import invalid_comparison # noqa:F401
from pandas.core.ops.mask_ops import kleene_and, kleene_or, kleene_xor # noqa: F401
from pandas.core.ops.methods import ( # noqa:F401
add_flex_arithmetic_methods,
add_special_arithmetic_methods,
Expand Down
6 changes: 6 additions & 0 deletions pandas/core/ops/dispatch.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,9 @@ def maybe_dispatch_ufunc_to_dunder_op(
"ge",
"remainder",
"matmul",
"or",
"xor",
"and",
}
aliases = {
"subtract": "sub",
Expand All @@ -204,6 +207,9 @@ def maybe_dispatch_ufunc_to_dunder_op(
"less_equal": "le",
"greater": "gt",
"greater_equal": "ge",
"bitwise_or": "or",
"bitwise_and": "and",
"bitwise_xor": "xor",
}

# For op(., Array) -> Array.__r{op}__
Expand Down
Loading