Skip to content

feat: support time range rolling on Series. #1590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Apr 8, 2025
Merged

Conversation

sycai
Copy link
Contributor

@sycai sycai commented Apr 3, 2025

No description provided.

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Apr 3, 2025
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Apr 3, 2025
@sycai sycai force-pushed the sycai_rolling_window branch from 81811e2 to 87f265b Compare April 4, 2025 00:20
@sycai sycai force-pushed the sycai_rolling_window branch from a4cd9dc to 58e0764 Compare April 4, 2025 01:27
@sycai sycai force-pushed the sycai_rolling_window branch from 58e0764 to 850ea41 Compare April 4, 2025 01:29
@sycai sycai force-pushed the sycai_rolling_window branch from 5239077 to db6a353 Compare April 4, 2025 03:22
@sycai sycai marked this pull request as ready for review April 4, 2025 16:25
@sycai sycai requested review from a team as code owners April 4, 2025 16:25
@sycai sycai requested a review from drylks-work April 4, 2025 16:25
@sycai sycai requested review from TrevorBergeron and chelsea-lin and removed request for drylks-work April 4, 2025 16:25
Copy link
Contributor

@TrevorBergeron TrevorBergeron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you will also want to modify the ordering pull-up logic to not pull ordering into range windows? Should be some similar logic already there.

@@ -579,6 +593,19 @@ def _convert_ordering_to_table_values(
return ordering_values


def _convert_range_ordering_to_table_value(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what makes this one different? is it that it doesn't allow NULLS FIRST/LAST overrides and only allows one column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. If we have to deal with NULLs, there will be multiple expressions after "ORDER BY", but it is not allowed by SQL window syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added doc that explains the reasons for future references.

Comment on lines 698 to 699
start=_to_ibis_boundary(bounds.start_micros),
end=_to_ibis_boundary(bounds.end_micros),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why define as micros? Sure, now its always micros, but not really a necessary property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has something to do with dealing with nullness. Property removed.

from bigframes.core import nodes


def rewrite_range_rolling(node: nodes.BigFrameNode) -> nodes.BigFrameNode:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I always propose compile-time rewrites, but I think validations need to happen earlier, in the API. The tree should always be valid, rewrites just transform from one valid tree to another.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about the length checking for ordering?

I will check the rolling window key on the API level too. Here this is simply to make sure we fail fast if some other error sneaks in.

Comment on lines +150 to +158
spec = window_spec.WindowSpec(
bounds=window_spec.RangeWindowBounds.from_timedelta_window(window, closed),
min_periods=1 if min_periods is None else min_periods,
ordering=(
ordering.OrderingExpression(
ex.deref(block.index_columns[0]), order_direction
),
),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like this where we are picking the ordering column immediately and validating

Comment on lines 162 to 163
@singledispatch
def _find_order_direction(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: probably should have tree stuff elsewhere, but whatever

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be simple to refactor, though. I moved them to a separated file

Comment on lines 174 to 175
if len(root.by) == 0:
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case, the node is a no-op so we can just call child

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL. code updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, all re-orderings are stable re-orderings, so it prepends to the existing ordering key rather than replaceing


@_find_order_direction.register
def _(root: nodes.FilterNode, column_id: str):
return _find_order_direction(root.child, column_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can probably also extend to the additive nodes (projection, window, isin), for which you can just ignore the contents and call child (or you can try to identify strictly increasing functions if you want to get fancy).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added windowNode and inNode. The ProjectNode is left out because some operations may invalidate ordering (e.g. multiplication with a negative value)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProjetNode is always additive, it doesn't mutate values, so it should be safe. If anything, it can provide alternative ordering keys if you know an operation is strictly increasing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@sycai sycai force-pushed the sycai_rolling_window branch from 31447ea to 2e86a47 Compare April 7, 2025 23:04
@sycai sycai force-pushed the sycai_rolling_window branch from 5f87995 to 0bdaf17 Compare April 7, 2025 23:17
@sycai sycai added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 7, 2025
@sycai sycai requested a review from TrevorBergeron April 7, 2025 23:19
@bigframes-bot bigframes-bot removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 7, 2025

return dataclasses.replace(
node,
window_spec=dataclasses.replace(node.window_spec, ordering=(new_ordering,)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be consistent to redefine the window spec bounds to integers rather than timestamp to be consistent with the underlying value now being an integer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I still prefer the timedeltas because it describes best what the range windows are for

@sycai sycai merged commit 6e98a2c into main Apr 8, 2025
24 checks passed
@sycai sycai deleted the sycai_rolling_window branch April 8, 2025 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants