-
Notifications
You must be signed in to change notification settings - Fork 50
feat: support time range rolling on Series. #1590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
81811e2
to
87f265b
Compare
a4cd9dc
to
58e0764
Compare
58e0764
to
850ea41
Compare
5239077
to
db6a353
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you will also want to modify the ordering pull-up logic to not pull ordering into range windows? Should be some similar logic already there.
@@ -579,6 +593,19 @@ def _convert_ordering_to_table_values( | |||
return ordering_values | |||
|
|||
|
|||
def _convert_range_ordering_to_table_value( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what makes this one different? is it that it doesn't allow NULLS FIRST/LAST overrides and only allows one column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. If we have to deal with NULLs, there will be multiple expressions after "ORDER BY", but it is not allowed by SQL window syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added doc that explains the reasons for future references.
bigframes/core/compile/compiled.py
Outdated
start=_to_ibis_boundary(bounds.start_micros), | ||
end=_to_ibis_boundary(bounds.end_micros), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why define as micros? Sure, now its always micros, but not really a necessary property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has something to do with dealing with nullness. Property removed.
from bigframes.core import nodes | ||
|
||
|
||
def rewrite_range_rolling(node: nodes.BigFrameNode) -> nodes.BigFrameNode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I always propose compile-time rewrites, but I think validations need to happen earlier, in the API. The tree should always be valid, rewrites just transform from one valid tree to another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you talking about the length checking for ordering
?
I will check the rolling window key on the API level too. Here this is simply to make sure we fail fast if some other error sneaks in.
spec = window_spec.WindowSpec( | ||
bounds=window_spec.RangeWindowBounds.from_timedelta_window(window, closed), | ||
min_periods=1 if min_periods is None else min_periods, | ||
ordering=( | ||
ordering.OrderingExpression( | ||
ex.deref(block.index_columns[0]), order_direction | ||
), | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like this where we are picking the ordering column immediately and validating
bigframes/core/window/rolling.py
Outdated
@singledispatch | ||
def _find_order_direction( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: probably should have tree stuff elsewhere, but whatever
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be simple to refactor, though. I moved them to a separated file
bigframes/core/window/rolling.py
Outdated
if len(root.by) == 0: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case, the node is a no-op so we can just call child
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL. code updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, all re-orderings are stable re-orderings, so it prepends to the existing ordering key rather than replaceing
bigframes/core/window/rolling.py
Outdated
|
||
@_find_order_direction.register | ||
def _(root: nodes.FilterNode, column_id: str): | ||
return _find_order_direction(root.child, column_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can probably also extend to the additive nodes (projection, window, isin), for which you can just ignore the contents and call child (or you can try to identify strictly increasing functions if you want to get fancy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added windowNode and inNode. The ProjectNode is left out because some operations may invalidate ordering (e.g. multiplication with a negative value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ProjetNode is always additive, it doesn't mutate values, so it should be safe. If anything, it can provide alternative ordering keys if you know an operation is strictly increasing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
31447ea
to
2e86a47
Compare
5f87995
to
0bdaf17
Compare
|
||
return dataclasses.replace( | ||
node, | ||
window_spec=dataclasses.replace(node.window_spec, ordering=(new_ordering,)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be consistent to redefine the window spec bounds to integers rather than timestamp to be consistent with the underlying value now being an integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I still prefer the timedeltas because it describes best what the range windows are for
No description provided.