Skip to content

Commit 3d6ec7e

Browse files
authored
Add eval method to Dataset (#7163)
* Add `eval` method to Dataset This needs proper tests & docs, but would this be a good idea? Example in the docstring
1 parent 299abd6 commit 3d6ec7e

File tree

4 files changed

+84
-0
lines changed

4 files changed

+84
-0
lines changed

doc/api.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,7 @@ Computation
192192
Dataset.map_blocks
193193
Dataset.polyfit
194194
Dataset.curvefit
195+
Dataset.eval
195196

196197
Aggregation
197198
-----------

doc/whats-new.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,9 @@ New Features
4141
- :py:meth:`~xarray.DataArray.rank` now operates on dask-backed arrays, assuming
4242
the core dim has exactly one chunk. (:pull:`8475`).
4343
By `Maximilian Roos <https://github.com/max-sixty>`_.
44+
- Add a :py:meth:`Dataset.eval` method, similar to the pandas' method of the
45+
same name. (:pull:`7163`). This is currently marked as experimental and
46+
doesn't yet support the ``numexpr`` engine.
4447
- :py:meth:`Dataset.drop_vars` & :py:meth:`DataArray.drop_vars` allow passing a
4548
callable, similar to :py:meth:`Dataset.where` & :py:meth:`Dataset.sortby` & others.
4649
(:pull:`8511`).

xarray/core/dataset.py

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,7 @@
9898
Self,
9999
T_ChunkDim,
100100
T_Chunks,
101+
T_DataArray,
101102
T_DataArrayOrSet,
102103
T_Dataset,
103104
ZarrWriteModes,
@@ -9554,6 +9555,68 @@ def argmax(self, dim: Hashable | None = None, **kwargs) -> Self:
95549555
"Dataset.argmin() with a sequence or ... for dim"
95559556
)
95569557

9558+
def eval(
9559+
self,
9560+
statement: str,
9561+
*,
9562+
parser: QueryParserOptions = "pandas",
9563+
) -> Self | T_DataArray:
9564+
"""
9565+
Calculate an expression supplied as a string in the context of the dataset.
9566+
9567+
This is currently experimental; the API may change particularly around
9568+
assignments, which currently returnn a ``Dataset`` with the additional variable.
9569+
Currently only the ``python`` engine is supported, which has the same
9570+
performance as executing in python.
9571+
9572+
Parameters
9573+
----------
9574+
statement : str
9575+
String containing the Python-like expression to evaluate.
9576+
9577+
Returns
9578+
-------
9579+
result : Dataset or DataArray, depending on whether ``statement`` contains an
9580+
assignment.
9581+
9582+
Examples
9583+
--------
9584+
>>> ds = xr.Dataset(
9585+
... {"a": ("x", np.arange(0, 5, 1)), "b": ("x", np.linspace(0, 1, 5))}
9586+
... )
9587+
>>> ds
9588+
<xarray.Dataset>
9589+
Dimensions: (x: 5)
9590+
Dimensions without coordinates: x
9591+
Data variables:
9592+
a (x) int64 0 1 2 3 4
9593+
b (x) float64 0.0 0.25 0.5 0.75 1.0
9594+
9595+
>>> ds.eval("a + b")
9596+
<xarray.DataArray (x: 5)>
9597+
array([0. , 1.25, 2.5 , 3.75, 5. ])
9598+
Dimensions without coordinates: x
9599+
9600+
>>> ds.eval("c = a + b")
9601+
<xarray.Dataset>
9602+
Dimensions: (x: 5)
9603+
Dimensions without coordinates: x
9604+
Data variables:
9605+
a (x) int64 0 1 2 3 4
9606+
b (x) float64 0.0 0.25 0.5 0.75 1.0
9607+
c (x) float64 0.0 1.25 2.5 3.75 5.0
9608+
"""
9609+
9610+
return pd.eval(
9611+
statement,
9612+
resolvers=[self],
9613+
target=self,
9614+
parser=parser,
9615+
# Because numexpr returns a numpy array, using that engine results in
9616+
# different behavior. We'd be very open to a contribution handling this.
9617+
engine="python",
9618+
)
9619+
95579620
def query(
95589621
self,
95599622
queries: Mapping[Any, Any] | None = None,

xarray/tests/test_dataset.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6718,6 +6718,23 @@ def test_query(self, backend, engine, parser) -> None:
67186718
# pytest tests — new tests should go here, rather than in the class.
67196719

67206720

6721+
@pytest.mark.parametrize("parser", ["pandas", "python"])
6722+
def test_eval(ds, parser) -> None:
6723+
"""Currently much more minimal testing that `query` above, and much of the setup
6724+
isn't used. But the risks are fairly low — `query` shares much of the code, and
6725+
the method is currently experimental."""
6726+
6727+
actual = ds.eval("z1 + 5", parser=parser)
6728+
expect = ds["z1"] + 5
6729+
assert_identical(expect, actual)
6730+
6731+
# check pandas query syntax is supported
6732+
if parser == "pandas":
6733+
actual = ds.eval("(z1 > 5) and (z2 > 0)", parser=parser)
6734+
expect = (ds["z1"] > 5) & (ds["z2"] > 0)
6735+
assert_identical(expect, actual)
6736+
6737+
67216738
@pytest.mark.parametrize("test_elements", ([1, 2], np.array([1, 2]), DataArray([1, 2])))
67226739
def test_isin(test_elements, backend) -> None:
67236740
expected = Dataset(

0 commit comments

Comments
 (0)