feat(replay): client-owned session_id + per-tab window_id + idle handling#350
feat(replay): client-owned session_id + per-tab window_id + idle handling#350ayushjhanwar-png wants to merge 2 commits into
Conversation
…ling Fixes the session-replay reliability issues found on prod Frameo data, matching PostHog's proven model. Fully backward compatible — old SDKs that don't send session_id/window_id fall through to the exact existing behaviour. SDK (@openpanel/web): - SessionIdManager: client generates + owns session_id in localStorage, rotates on 30-min idle / 24-hr cap (no more server/client divergence). - window_id per tab / page-load (in-memory): distinguishes concurrent recorders so multi-tab sessions never collide on chunk_index. Rotates with the session (PostHog parity). - Recorder starts synchronously (no 10s poll). - PostHog-style idle handling: only real user interactions (mouse/click/ scroll/input) count as activity — DOM mutations do NOT. After 5 min idle the recorder drops events instead of recording; resumes with a fresh FullSnapshot on the next interaction. Kills multi-hour "ghost" recordings. Server: - api: trust client session_id + thread window_id into replay chunks. - worker: use client session_id when present; treat a rotated id as a new session. Legacy path (no client id) unchanged. DB: - session_replay_chunks: add window_id column (migration 20). - getSessionWindows + windowId filter on all chunk-fetch queries. Dashboard: - Per-tab window selector on the session detail page (plays one recorder at a time — no mixed rrweb mirror states / "Node not found"). - Skip-idle toggle in the player (default on). Docs: session-replay-rendering.md — how a recording renders in the player.
📝 WalkthroughWalkthroughThis PR adds window-scoped session replay storage and retrieval, client-owned session ID rotation, idle-aware replay recording and playback, multi-window replay selection in the UI, worker session handling changes, and updated replay documentation. ChangesSession replay window scoping and idle-skip feature
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/worker/src/jobs/events.incoming-event.ts (1)
180-213: 🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy liftReplace the existing delayed session-end job when
session_idchanges.createSessionEndJobdedupes onprojectId+deviceId, so the stale job keeps the oldsessionIdpayload. After a client-side rotation, later events keep matching that old job, which leads to duplicatesession_startevents and no end job for the new session. Update/remove the pending job before enqueueing the new one, or reuse it with the new payload.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/worker/src/jobs/events.incoming-event.ts` around lines 180 - 213, The delayed session-end job handling in incoming-event processing is leaving a stale job behind when `session_id` changes, so later events keep matching the old payload. Update the `createSessionEndJob` flow in `events.incoming-event.ts` to remove or replace any existing pending job before enqueueing the new one, and make sure the new job uses the current `payload`/`sessionId` from the `isNewSession` path so `createSessionStart` and `createSessionEndJob` stay in sync.
🧹 Nitpick comments (3)
apps/start/src/components/sessions/replay/index.tsx (1)
324-342: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd
aria-pressedto the Skip idle toggle.It's a two-state toggle button; exposing
aria-pressed={skipInactive}lets assistive tech announce the on/off state (the visual styling alone doesn't convey it).♻️ Proposed change
<button type="button" + aria-pressed={skipInactive} onClick={() => setSkipInactive((v) => !v)}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/start/src/components/sessions/replay/index.tsx` around lines 324 - 342, The Skip idle control in the replay session component is a two-state toggle but does not expose its pressed state to assistive technologies. Update the button in the replay UI to include aria-pressed driven by skipInactive so screen readers can announce the on/off state; use the existing toggle handler and label/title in the same button component to locate it.packages/sdks/web/src/index.ts (1)
219-263: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueExtract the duplicated recorder-start block.
The
startReplayRecorder(opts, (chunk) => this.send({ type: 'replay', payload: { ...chunk, session_id: activeSessionId } }), bumpActivity)invocation is duplicated verbatim for initial start and rotation restart. Extracting a small local helper (closing overbumpActivity) keeps the two paths from drifting.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/sdks/web/src/index.ts` around lines 219 - 263, The recorder start logic is duplicated in the initial setup and the session-rotation restart inside the sessionManager.onSessionIdChanged handler. Extract the shared startReplayRecorder call into a small local helper near activeSessionId and bumpActivity that closes over the send payload construction, then use that helper in both places to keep the behavior in sync and reduce drift.packages/sdks/web/src/replay/recorder.ts (1)
147-151: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win
onUserActivityfires on every interactive emit, including high-frequency MouseMove/Scroll.
ACTIVE_SOURCESincludes MouseMove (1) and Scroll (3), which emit many times per second. The downstreamonUserActivity(bumpActivity) performs a synchronouslocalStorage.setItemon each call, so this can turn scroll/move into a localStorage-write hot path. Consider throttling activity bumps (e.g., at most once per second) since idle thresholds are in minutes.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/sdks/web/src/replay/recorder.ts` around lines 147 - 151, `onUserActivity` in `Recorder` is firing on every interactive event, including high-frequency MouseMove and Scroll events, which can trigger excessive synchronous `localStorage` writes via `bumpActivity`. Update the `record`/interactive-event path to throttle activity notifications so `onUserActivity` is called at most once per second (or similar), while still updating `lastActivityMs` for all interactive events; use the existing `isInteractiveEvent` and `ACTIVE_SOURCES` flow in `recorder.ts` to place the guard cleanly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/sdks/web/src/index.ts`:
- Around line 235-268: The replay stop flow in the web SDK still leaves the
session-rotation listener active, so `stopReplay()` does not fully stop
recording. Capture the unsubscribe returned by
`this.sessionManager.onSessionIdChanged(...)` in the same class that defines
`startReplayRecorder()`/`stopReplay()`, and make `stopReplay()` invoke that
cleanup before or alongside `stopReplayRecorder()`. Ensure the callback
registered in `onSessionIdChanged` cannot fire after a manual stop, so it does
not call `startReplayRecorder()` again on the next session rotation.
---
Outside diff comments:
In `@apps/worker/src/jobs/events.incoming-event.ts`:
- Around line 180-213: The delayed session-end job handling in incoming-event
processing is leaving a stale job behind when `session_id` changes, so later
events keep matching the old payload. Update the `createSessionEndJob` flow in
`events.incoming-event.ts` to remove or replace any existing pending job before
enqueueing the new one, and make sure the new job uses the current
`payload`/`sessionId` from the `isNewSession` path so `createSessionStart` and
`createSessionEndJob` stay in sync.
---
Nitpick comments:
In `@apps/start/src/components/sessions/replay/index.tsx`:
- Around line 324-342: The Skip idle control in the replay session component is
a two-state toggle but does not expose its pressed state to assistive
technologies. Update the button in the replay UI to include aria-pressed driven
by skipInactive so screen readers can announce the on/off state; use the
existing toggle handler and label/title in the same button component to locate
it.
In `@packages/sdks/web/src/index.ts`:
- Around line 219-263: The recorder start logic is duplicated in the initial
setup and the session-rotation restart inside the
sessionManager.onSessionIdChanged handler. Extract the shared
startReplayRecorder call into a small local helper near activeSessionId and
bumpActivity that closes over the send payload construction, then use that
helper in both places to keep the behavior in sync and reduce drift.
In `@packages/sdks/web/src/replay/recorder.ts`:
- Around line 147-151: `onUserActivity` in `Recorder` is firing on every
interactive event, including high-frequency MouseMove and Scroll events, which
can trigger excessive synchronous `localStorage` writes via `bumpActivity`.
Update the `record`/interactive-event path to throttle activity notifications so
`onUserActivity` is called at most once per second (or similar), while still
updating `lastActivityMs` for all interactive events; use the existing
`isInteractiveEvent` and `ACTIVE_SOURCES` flow in `recorder.ts` to place the
guard cleanly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 5c05bd9a-725d-453d-9d75-10a9d5084785
📒 Files selected for processing (14)
apps/api/src/controllers/track.controller.tsapps/start/src/components/sessions/replay/index.tsxapps/start/src/components/sessions/replay/replay-player.tsxapps/worker/src/jobs/events.incoming-event.tsdocs/session-replay-rendering.mdpackages/db/code-migrations/20-add-window-id-to-replay-chunks.tspackages/db/src/buffers/replay-buffer.tspackages/db/src/services/session.service.tspackages/sdks/sdk/src/index.tspackages/sdks/web/src/index.tspackages/sdks/web/src/replay/recorder.tspackages/sdks/web/src/session-id-manager.tspackages/sdks/web/src/storage.tspackages/trpc/src/routers/session.ts
… throttle, a11y - web SDK: stopReplay() now detaches the session-rotation listener so a later rotation can't restart the recorder after a manual stop. Extract a shared startForSession() helper (was duplicated between initial start + restart). - worker: createSessionEndJob replaces any existing delayed job for the device before enqueueing. Previously a client session_id rotation left the stale (deviceId-keyed) job with the old payload, so every later event saw isNewSession=true and spawned duplicate session_start events. - recorder: throttle onUserActivity to <=1/sec — MouseMove/Scroll fired a localStorage write on every event. lastActivityMs still updates on all. - dashboard: aria-pressed on the Skip-idle toggle for screen readers.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
apps/worker/src/utils/session-handler.ts (1)
39-45: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick winLog the swallowed
remove()failure.
Whenremove()throws on a locked job, the fallbackadd()is a no-op and the stale session payload stays in the queue. Awarn/debuglog here would make that failure mode observable without changing the non-fatal behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/worker/src/utils/session-handler.ts` around lines 39 - 45, The session cleanup path in session-handler’s existing job removal swallows failures from existing.remove(), which hides locked-job cleanup problems. Update the getJob/remove flow to catch the remove error and emit a warn/debug log with the jobId and error details before continuing with the fallback add; keep the non-fatal behavior intact in the same existing/session queue logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@apps/worker/src/utils/session-handler.ts`:
- Around line 39-45: The session cleanup path in session-handler’s existing job
removal swallows failures from existing.remove(), which hides locked-job cleanup
problems. Update the getJob/remove flow to catch the remove error and emit a
warn/debug log with the jobId and error details before continuing with the
fallback add; keep the non-fatal behavior intact in the same existing/session
queue logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 41cdef73-dcd3-4cbf-aa63-05b6f1dd6200
📒 Files selected for processing (4)
apps/start/src/components/sessions/replay/index.tsxapps/worker/src/utils/session-handler.tspackages/sdks/web/src/index.tspackages/sdks/web/src/replay/recorder.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/sdks/web/src/replay/recorder.ts
- packages/sdks/web/src/index.ts
- apps/start/src/components/sessions/replay/index.tsx
What & why
Fixes the session-replay reliability issues found on prod Frameo data. On prod, ~44% of eligible sessions showed no replay and ~46% of replay-bearing sessions had chunks spanning far beyond the session (stale-closure / multi-tab collisions). Root cause: the server owned
session_idand the recorder captured it once, so events and chunks diverged; multiple tabs collided onchunk_index; and background DOM churn kept sessions alive for hours.This ports PostHog's proven model. Fully backward compatible — old SDKs that don't send
session_id/window_idfall through to the exact existing server behaviour (verified by diff: the new worker path only activates when a clientsession_idis present).Changes
SDK (
@openpanel/web)SessionIdManager— client generates + ownssession_idin localStorage, rotates on 30-min idle / 24-hr cap.window_idper tab/page-load (in-memory) — concurrent recorders never collide onchunk_index; rotates with the session.Server
api: trust clientsession_id; threadwindow_idinto replay chunks.worker: use clientsession_idwhen present; a rotated id starts a new session. Legacy path unchanged.DB
session_replay_chunks: addwindow_idcolumn (migration20).getSessionWindows+windowIdfilter on all chunk-fetch queries.Dashboard
Testing (local worker/API against dev backend)
Deploy plan
Merge → dev first. Before prod: run analytics regression (session duration/bounce after the worker change), verify the CH migration via the framework, then ship the SDK to
@dashverse/openpanel-weband roll out on Frameo behind a flag.Deferred (follow-ups, tracked)
Summary by CodeRabbit
New Features
Bug Fixes