perf(db): fast-path conversions with 1 event-property breakdown#352
perf(db): fast-path conversions with 1 event-property breakdown#352ayushjhanwar-png wants to merge 1 commit into
Conversation
Extends buildArrayPatternSql to accept an optional breakdown param and emits a split-scan variant (user_installs + user_finishes CTEs joined by resolved profile). Grouping a single-scan by (user, breakdown) pins finishes to the install's breakdown bucket and under-counts (measured 6-10% for apps.instagram.com vs. correct 22-93%). Split scans preserve the "convert on the user, not on the utm-tagged finish" semantic. Prod-measured on shortreels 30-day installReferrer→deepLinkCaptured with ref_utm_source breakdown: 35.6s (ASOF path) → 20.1s (this fast path) = 1.77x. Bytes read halves too (77 GiB → 38 GiB) because each split scan filters to one event name. Gate: only fires when breakdown is a simple event-level property (not profile.*, not cohort, not custom event). All other paths (holds, cohorts, custom events, session group, TTC, multi-breakdown) unchanged.
📝 WalkthroughWalkthroughThis PR extends the ClickHouse-based conversion fast path to support a single event-level property breakdown by generating a split-scan SQL variant, widens fast-path eligibility checks accordingly, and adds documentation tracking conversion-chart performance investigations and shipped optimizations. ChangesConversion fast-path breakdown support
Estimated code review effort: 4 (Complex) | ~45 minutes Sequence Diagram(s)sequenceDiagram
participant Client
participant getConversion
participant buildArrayPatternSql
participant ClickHouse
Client->>getConversion: request conversion with breakdown
getConversion->>getConversion: canUseArrayPath (breakdowns.length <= 1)
alt eligible fast path
getConversion->>buildArrayPatternSql: breakdown, topNLimit
buildArrayPatternSql->>buildArrayPatternSql: build split-scan CTEs
buildArrayPatternSql->>ClickHouse: execute breakdown-grouped query
ClickHouse-->>getConversion: top-N breakdown rows
else not eligible
getConversion->>ClickHouse: fall back to ASOF path
ClickHouse-->>getConversion: conversion rows
end
getConversion-->>Client: conversion response
Suggested reviewers: 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
docs/conversion-chart-perf.md (1)
111-121: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueAdd language identifiers to fenced code blocks.
Static analysis (markdownlint MD040) flags these fences as missing a language hint (e.g.,
sql for the `ORDER BY` snippets,text for the ProfileEvents dump).Also applies to: 352-365
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/conversion-chart-perf.md` around lines 111 - 121, Add language identifiers to the fenced code blocks in the markdown snippet so it passes markdownlint MD040. Update the `ORDER BY` examples in the conversion chart doc to use an appropriate fence like `sql`, and ensure the other fenced dumps in the same section (including the ProfileEvents example referenced by the comment) use a matching language tag such as `text`.Source: Linters/SAST tools
packages/db/src/services/conversion.service.ts (1)
293-459: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚖️ Poor tradeoffSignificant SQL duplication between the breakdown and non-breakdown branches.
The breakdown branch (372-457) and the existing single-scan branch (460-514) repeat the same
alalias-resolution CTE text,WITH ${filteredProfilesCte}prefix, and per_user_per_day arrayJoin pattern almost verbatim, with onlyb_0threading and the split install/finish scans differing. A future fix to alias resolution or the day-bucketing logic would need to be applied in both places, risking drift.Consider extracting the shared fragments (e.g., the
alCTE text, the day-bucketing arrayJoin block) into local string constants/helpers reused by both branches.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/db/src/services/conversion.service.ts` around lines 293 - 459, The breakdown and non-breakdown paths in conversion.service.ts duplicate the same SQL fragments, especially the alias-resolution CTE (`al`), the `WITH ${filteredProfilesCte}` prefix, and the per_user_per_day day-bucketing logic. Extract these shared pieces into local constants or helper builders inside the conversion query method, then reuse them in both the breakdown branch and the single-scan branch so future changes to alias resolution or date bucketing only need to be made once.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/db/src/services/conversion.service.ts`:
- Around line 584-592: The comment references a nonexistent helper, so update
the array-aggregate path note in conversion.service to match the actual
implementation. Replace the mention of buildArrayPatternWithBreakdownSql with
the real branch inside buildArrayPatternSql, and make it clear that the one
event-level breakdown case is handled by that function rather than a separate
helper. Keep the wording aligned with the surrounding fast-path comments so
readers can find the correct logic quickly.
---
Nitpick comments:
In `@docs/conversion-chart-perf.md`:
- Around line 111-121: Add language identifiers to the fenced code blocks in the
markdown snippet so it passes markdownlint MD040. Update the `ORDER BY` examples
in the conversion chart doc to use an appropriate fence like `sql`, and ensure
the other fenced dumps in the same section (including the ProfileEvents example
referenced by the comment) use a matching language tag such as `text`.
In `@packages/db/src/services/conversion.service.ts`:
- Around line 293-459: The breakdown and non-breakdown paths in
conversion.service.ts duplicate the same SQL fragments, especially the
alias-resolution CTE (`al`), the `WITH ${filteredProfilesCte}` prefix, and the
per_user_per_day day-bucketing logic. Extract these shared pieces into local
constants or helper builders inside the conversion query method, then reuse them
in both the breakdown branch and the single-scan branch so future changes to
alias resolution or date bucketing only need to be made once.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 3796333d-df75-4f48-8950-914c40ecbce2
📒 Files selected for processing (2)
docs/conversion-chart-perf.mdpackages/db/src/services/conversion.service.ts
| // Array-aggregate fast path. 3-7x speedup vs the multi-CTE ASOF JOIN | ||
| // path below. Two shapes: | ||
| // - No breakdown: single-scan groupArrayIf (buildArrayPatternSql) | ||
| // - One event-level breakdown: split installs/finishes CTEs | ||
| // (buildArrayPatternWithBreakdownSql) — 1.77x measured on prod | ||
| // shortreels 30-day query. | ||
| // Everything unsupported (holds, cohorts, custom events, session group, | ||
| // TTC, profile.* filters, profile.* / cohort breakdowns, multi-breakdown) | ||
| // falls through to the ASOF path unchanged. |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Comment references a function that doesn't exist.
The comment says the breakdown path uses buildArrayPatternWithBreakdownSql, but the implementation is a branch inside buildArrayPatternSql (see line 370) — there's no separate function by that name in this file. This could mislead future readers searching for it.
✏️ Suggested fix
- // - One event-level breakdown: split installs/finishes CTEs
- // (buildArrayPatternWithBreakdownSql) — 1.77x measured on prod
- // shortreels 30-day query.
+ // - One event-level breakdown: split installs/finishes CTEs
+ // (buildArrayPatternSql's `breakdown` branch) — 1.77x measured on
+ // prod shortreels 30-day query.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Array-aggregate fast path. 3-7x speedup vs the multi-CTE ASOF JOIN | |
| // path below. Two shapes: | |
| // - No breakdown: single-scan groupArrayIf (buildArrayPatternSql) | |
| // - One event-level breakdown: split installs/finishes CTEs | |
| // (buildArrayPatternWithBreakdownSql) — 1.77x measured on prod | |
| // shortreels 30-day query. | |
| // Everything unsupported (holds, cohorts, custom events, session group, | |
| // TTC, profile.* filters, profile.* / cohort breakdowns, multi-breakdown) | |
| // falls through to the ASOF path unchanged. | |
| // Array-aggregate fast path. 3-7x speedup vs the multi-CTE ASOF JOIN | |
| // path below. Two shapes: | |
| // - No breakdown: single-scan groupArrayIf (buildArrayPatternSql) | |
| // - One event-level breakdown: split installs/finishes CTEs | |
| // (buildArrayPatternSql's `breakdown` branch) — 1.77x measured on | |
| // prod shortreels 30-day query. | |
| // Everything unsupported (holds, cohorts, custom events, session group, | |
| // TTC, profile.* filters, profile.* / cohort breakdowns, multi-breakdown) | |
| // falls through to the ASOF path unchanged. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/db/src/services/conversion.service.ts` around lines 584 - 592, The
comment references a nonexistent helper, so update the array-aggregate path note
in conversion.service to match the actual implementation. Replace the mention of
buildArrayPatternWithBreakdownSql with the real branch inside
buildArrayPatternSql, and make it clear that the one event-level breakdown case
is handled by that function rather than a separate helper. Keep the wording
aligned with the surrounding fast-path comments so readers can find the correct
logic quickly.
Summary
buildArrayPatternSqlinconversion.service.tsto accept an optionalbreakdownparam. Emits a split-scan variant (user_installs+user_finishesCTEs joined by resolved profile) instead of the single-scan groupArrayIf.getConversionrelaxed to allow ONE simple event-level property breakdown (notprofile.*, not cohort, not custom event). All other paths (holds, cohorts, custom events, session group, TTC, multi-breakdown) unchanged.Why not the single-scan shape
Grouping the existing single-scan
user_eventsby(user, breakdown)pinsfinishesto the install's breakdown bucket → under-counts conversions when the deep-link event carries empty or different breakdown values.Measured on shortreels 30-day
installReferrer → deepLinkCapturedwithref_utm_sourcebreakdown:1.77× speedup with semantics preserved. Bytes read halves too — each split scan filters to ONE event name.
What the query looks like
Query builder rewrite (
chart.service.ts.getSelectPropertyKey) auto-swapsproperties['key']→ materialized column ref when one exists — so this fast-path benefits directly from anyopenpanel-materialize-analysisdecisions.Companion: ref_utm_source materialized on
eventsNot in this PR, but same session did:
openpanel-materialize-analysisCronJob → pickedevents:ref_utm_source(cardinality 10, well under the 5000 cap)202607(~14 min) and202606(~30 min) viaALTER TABLE events MATERIALIZE COLUMN ... IN PARTITIONdocs/conversion-chart-perf.md(included in this PR)Combined effect on the shortreels dashboard query: 35.6s → 20.1s (1.77× faster) end-to-end, once the API pod's 1h materialized-columns cache refreshes (or on
kubectl rollout restart deployment openpanel-api -n prod).Test plan
pnpm --filter db typecheck— no new errors inconversion.service.ts(pre-existing errors in unrelated files remain)profile.*/ cohort / custom-event breakdowns → falls through to ASOFsystem.query_logfor conversion queries with 1 breakdown — expect wall-clock dropSummary by CodeRabbit
New Features
Bug Fixes