perf(conversion): source start/end events from profile_event_summary_mv when eligible#298
Open
ayushjhanwar-png wants to merge 1 commit into
Open
perf(conversion): source start/end events from profile_event_summary_mv when eligible#298ayushjhanwar-png wants to merge 1 commit into
ayushjhanwar-png wants to merge 1 commit into
Conversation
…mv when eligible For conversion charts where no per-event filters and no breakdown / hold property columns are needed, the start_events_raw and end_events_raw CTEs now read from profile_event_summary_mv instead of scanning the raw events table. The MV is pre-aggregated by (project_id, profile_id, name, event_date) with first_event_time stored as an AggregateFunction — which is exactly the shape downstream wants (min(created_at) per profile per day). Pulling from the MV skips reading the full events table for the time window, which can be hundreds of millions of rows on a 1-month conversion against a high-traffic project. Eligibility (otherwise the existing events-table path is used): - no per-event filters - no extra columns (breakdown / hold properties live on events table) - groupCol === 'profile_id' (MV is profile-keyed) session_id is emitted as '' from the MV path — downstream `any(...)` accepts that and current callers don't read it when grouping by profile_id, so behaviour is preserved. Measured impact on a 1-month shortreels appOpen conversion: events table → ~28 GiB read, ~10-40s wall time, frequent 40s timeouts MV path → ~6 GiB read, ~600 ms wall time Numbers match the events table exactly (verified via uniq + countMerge side-by-side).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Conversion charts on plain event names (no per-event filters, no breakdown / hold properties) now source the start_events_raw and end_events_raw CTEs from the existing `profile_event_summary_mv` instead of scanning the raw `events` table.
The MV is already pre-aggregated by `(project_id, profile_id, name, event_date)` with `first_event_time` stored as an AggregateFunction — exactly the shape downstream wants (`min(created_at)` per profile per day). For high-traffic projects on wide windows, pulling from the MV skips hundreds of millions of rows.
Eligibility
A single event's CTE uses the MV path only when:
Otherwise the existing `events`-table path is used unchanged. `session_id` is emitted as `''` from the MV branch — downstream `any(session_id)` accepts it and current callers don't read it when grouping by `profile_id`.
Measured impact
1-month shortreels `appOpen` conversion:
Numbers match the `events` table exactly. Verified side-by-side via `uniq(profile_id)` and `countMerge(event_count)`:
```
events: 571410 profiles, 2244741 events
MV: 571410 profiles, 2244741 events
```
Test plan
Notes