-
Notifications
You must be signed in to change notification settings - Fork 156
feat: Adds Compliant(When|Then)
#2261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Very barebones to start with - Mostly just defining the current impl in a generic way - There are some quirks I'd like to iron out before merging
narwhals/_compliant/when_then.py
Outdated
def otherwise( | ||
self, value: CompliantExprT | CompliantSeriesOrNativeExprT_co | _Scalar, / | ||
) -> CompliantExprT: | ||
self._call._otherwise_value = value | ||
self._function_name = "whenotherwise" | ||
return cast("CompliantExprT", self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about this for a while.
@MarcoGorelli would you be able to help me understand what is going on here?
For reference, I'm just mirroring what is in (Arrow|Pandas)Then
narwhals/narwhals/_arrow/namespace.py
Lines 388 to 391 in 29c0f6c
def otherwise(self: Self, value: ArrowExpr | ArrowSeries | _Scalar) -> ArrowExpr: | |
self._call._otherwise_value = value | |
self._function_name = "whenotherwise" | |
return self |
Side note for DuckDBThen
I think that is typing issue would be fixed by annotating DuckDB._call: DuckDBWhen
narwhals/narwhals/_duckdb/namespace.py
Lines 320 to 326 in 29c0f6c
def otherwise(self: Self, value: DuckDBExpr | Any) -> DuckDBExpr: | |
# type ignore because we are setting the `_call` attribute to a | |
# callable object of type `DuckDBWhen`, base class has the attribute as | |
# only a `Callable` | |
self._call._otherwise_value = value # type: ignore[attr-defined] | |
self._function_name = "whenotherwise" | |
return self |
1. Mutability
*Then.otherwise
mutates both *When
and *Then
.
I expected that would cause an issue for something like:
Code block
import pandas as pd
import polars as pl
import narwhals as nw
df = pl.DataFrame({"a": ["A", "B", "A"], "b": [1, 2, 3], "c": [9, 2, 4], "d": [8, 7, 8]})
nw_pd = nw.from_native(nw.from_native(df).to_pandas())
when = nw.when(nw.col("a") == "B")
when_then = when.then("b")
when_then_otherwise = when.then("b").otherwise("c")
nw_pd.with_columns(
when_then.alias("e"),
when_then_otherwise.alias("f"),
when_then.otherwise("d").alias("g"),
when.then(nw.col("c") * 10).otherwise(nw.col("c") * 100).alias("h"),
)
But it all worked 🎉!
Output
┌─────────────────────────────┐
| Narwhals DataFrame |
|-----------------------------|
| a b c d e f g h|
|0 A 1 9 8 NaN 9 8 900|
|1 B 2 2 7 2.0 2 2 20|
|2 A 3 4 8 NaN 4 8 400|
└─────────────────────────────┘
However, AFAICT we're depending on the implementation details of nw.(When|Then)
to make this safe.
My best guess is we get new function(s) (and therefore Compliant*
objects) on every narwhals
-level method call?
nw.(When|Then)
narwhals/narwhals/functions.py
Lines 1618 to 1650 in 29c0f6c
class When: | |
def __init__(self: Self, *predicates: IntoExpr | Iterable[IntoExpr]) -> None: | |
self._predicate = all_horizontal(*flatten(predicates)) | |
check_expressions_preserve_length(self._predicate, function_name="when") | |
def then(self: Self, value: IntoExpr | Any) -> Then: | |
return Then( | |
lambda plx: apply_n_ary_operation( | |
plx, | |
lambda *args: plx.when(args[0]).then(args[1]), | |
self._predicate, | |
value, | |
str_as_lit=False, | |
), | |
combine_metadata(self._predicate, value, str_as_lit=False), | |
) | |
class Then(Expr): | |
def otherwise(self: Self, value: IntoExpr | Any) -> Expr: | |
kind = infer_kind(value, str_as_lit=False) | |
def func(plx: CompliantNamespace[Any, Any]) -> CompliantExpr[Any, Any]: | |
compliant_expr = self._to_compliant_expr(plx) | |
compliant_value = extract_compliant(plx, value, str_as_lit=False) | |
if is_scalar_like(kind) and is_compliant_expr(compliant_value): | |
compliant_value = compliant_value.broadcast(kind) | |
return compliant_expr.otherwise(compliant_value) # type: ignore[attr-defined, no-any-return] | |
return Expr( | |
func, | |
combine_metadata(self, value, str_as_lit=False), | |
) |
Suggestions
If we change nothing
It might be helpful to add some tests for this kind of thing to avoid unintentionally regressing.
The tests don't appear to cover it
The example code I've given was arbitrary, but the pattern of reusing parts of a when-then-otherwise
is something I've done pretty frequently in altair
:
- https://altair-viz.github.io/gallery/dot_dash_plot.html
- https://altair-viz.github.io/gallery/multiline_tooltip_standard.html
- https://altair-viz.github.io/gallery/scatter_point_paths_hover.html
.otherwise(...)
If we returned a new object - I think that would be more consistent with the rest of the Compliant*
API
2. function_name
Is this solely to align with the CompliantExpr._function_name
requirement?
Every backend defines this, but I haven't found anywhere that utilizes the distinction between the two:
class SomeWhen:
def then(self, value):
return SomeThen(..., function_name="whenthen", ...)
class SomeThen:
def otherwise(self, value):
...
self._function_name = "whenotherwise"
return self
3. otherwise() -> CompliantExpr
I'm assuming the intent is to "erase" the fact that this returns CompliantThen
?
At this stage we have a Callable[[Frame], Sequence[Series]]
.
So, we could go a step further and actually wrap it in the parent class?
Note
All of this will need to be rewritten either way, if anyone picks up chaining again (#669)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we can get rid of tracking
function_name
at the compliant level
@MarcoGorelli we're getting a bit too in sync lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, sounds good to returns a new object and avoid mutating
I've held off allowing chaining of when-then statements until the current implementation is better, so definitely happy to make steps in the right direction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, sounds good to returns a new object and avoid mutating
I've held off allowing chaining of when-then statements until the current implementation is better, so definitely happy to make steps in the right direction
@MarcoGorelli lovely
Should be quite easy to tweak after this PR 😎
- Skipping `__init__` to internally handle a generic `CompliantExpr` - Should mean we only need: - `CompliantWhen.__call__(...)` - Stub `CompliantThen` as a mixin with the main `*Expr` class - Return that type from `CompliantWhen._then`
self._call._otherwise_value = value | ||
self._function_name = "whenotherwise" | ||
return self | ||
class ArrowThen(CompliantThen[ArrowDataFrame, ArrowSeries, ArrowExpr], ArrowExpr): ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before anyone asks - yes this is a fully working implementation 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow!
narwhals/_compliant/when_then.py
Outdated
def __call__( | ||
self, compliant_frame: CompliantFrameT, / | ||
) -> Sequence[CompliantSeriesOrNativeExprT]: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
Try to find common parts and split up their implementation
- Ideally be able to define more as private methods here
- Less work for subclasses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will allow most of (Arrow|Pandas)Then.__call__
to be the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't expecting I'd be able to reduce it down this much ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved from (cc5cce1) onwards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note
Mentioned in (b0b1a05) that I'm tentative on LazyWhen
Happy to merge as-is, but its just the least-bad idea I had so far
- Reserve `_when`, `_then` for types - Always use `_*_value` for expression fragments stored on an instance - Positional-only, non-prefixed args for the method/function they correspond to #2261 (comment)
- The annotations in `when_then` will be much easier to read by just defining the `TypeVar`(s) internally - Also means one less place to keep the type params in sync
First step towards (#2261 (comment))
Using `Incomplete` for now, instead of adding more `TypeVar`s
- Need to sleep on this one - It works, but seems like quite a battle to unify them
Compliant(When|Then)
Compliant(When|Then)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._call._otherwise_value = value | ||
self._function_name = "whenotherwise" | ||
return self | ||
class ArrowThen(CompliantThen[ArrowDataFrame, ArrowSeries, ArrowExpr], ArrowExpr): ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow!
Thanks @MarcoGorelli 😍 Wasn't sure if I'd gone off the deep end - but I'm really happy with how clean each of the backends are looking now |
def maybe_evaluate_expr( | ||
df: DuckDBLazyFrame, obj: DuckDBExpr | object | ||
) -> duckdb.Expression: | ||
from narwhals._duckdb.expr import DuckDBExpr | ||
|
||
if isinstance(obj, DuckDBExpr): | ||
column_results = obj._call(df) | ||
assert len(column_results) == 1 # debug assertion # noqa: S101 | ||
return column_results[0] | ||
return lit(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beat me to it @MarcoGorelli 😉
(a339a5e)
@classmethod | ||
def from_when( | ||
cls, | ||
when: CompliantWhen[FrameT, SeriesT, ExprT], | ||
then: IntoExpr[SeriesT, ExprT], | ||
/, | ||
) -> Self: | ||
when._then_value = then | ||
obj = cls.__new__(cls) | ||
obj._call = when | ||
obj._when_value = when | ||
obj._depth = 0 | ||
obj._function_name = "whenthen" | ||
obj._evaluate_output_names = getattr( | ||
then, "_evaluate_output_names", lambda _df: ["literal"] | ||
) | ||
obj._alias_output_names = getattr(then, "_alias_output_names", None) | ||
obj._implementation = when._implementation | ||
obj._backend_version = when._backend_version | ||
obj._version = when._version | ||
obj._call_kwargs = {} | ||
return obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Probably going to try something like this for _compliant.selectors.py
Will wait for (#2266) to land first though
What type of PR is this? (check all applicable)
Related issues
Compliant*
protocols #2230CompliantNamespace
protocol #2202 (comment)Checklist
Tasks
CompliantWhen
CompliantThen
Arrow(When|Then)
Dask(When|Then)
DuckDB(When|Then)
Pandas(When|Then)
SparkLike(When|Then)
CompliantWhen.__call__
(See thread)EagerWhen
(DuckDB|SparkLike)When
DaskWhen
DepthTracking...
idea (chore: TrackExpansionKind
inExprMetadata
#2266 (comment))