Analytics reliability improvements + bug fixes#193
Open
Conversation
Fix local connect args bug (P0): - Fix argument order mismatch between TS and Python for local server connections — url was uninitialized but still sent, causing docker/port options to be silently dropped - Fix getattr() on dict in Python wrapper — use dict.get() instead - Support both legacy and corrected arg shapes for backward compatibility Add server.connection_failed event (P1): - Emit dedicated failure event with error taxonomy on connect failure - Include connectionType, serverUrlCategory, docker, portProvided, and privacy-safe error classification (errorKind, errorSource, messageHash) Fix unknown disconnections (P1): - Store lastConnectedType on connect, use it on disconnect instead of re-categorizing the post-disconnect URL (which reverts to sqlite) - Add disconnectReason property (user_initiated vs unexpected) using SERVER_DISCONNECT_REQUESTED intent signal from disconnect command Add centralized error.occurred tracking (P1): - Emit error.occurred from LSClient.sendLsClientRequest at three points: client not ready (preflight), request throws (request), error response - Dedupe identical errors within 60s window; hard-dedupe lsp_not_ready to once per session Add Python/ZenML version to common properties (P2): - Cache pythonVersion, zenmlVersion, zenmlInstalled in AnalyticsService - Wire from LSClient (zenml/isInstalled notification) and ZenExtension (interpreter change) via ENVIRONMENT_INFO_UPDATED EventBus pattern - Include in all event common properties Add session tracking and extension.deactivated (P2): - Generate sessionId (UUID) on AnalyticsService.initialize() - Include sessionId in all events as common property - Emit extension.deactivated with sessionDurationMs before dispose Add first-run detection (P2): - Use globalState key to detect first-ever activation - Emit extension.first_activated once per install - Add isFirstActivation flag to extension.activated event Add component CRUD analytics events (P3): - component.registered and component.updated from ComponentsForm - component.deleted from components/cmds with error taxonomy on failure New file: src/utils/analytics.ts - ErrorKind/ErrorSource/ErrorPhase taxonomy types - sanitizeErrorForAnalytics() — privacy-safe error classification + hash - normalizeForHash() — strips URLs, paths, UUIDs, tokens before hashing - isErrorLikeResponse() — type guard for error responses Update CLAUDE.md with analytics architecture docs and event table. Update README.md "What We Collect" section for new events/properties.
htahir1
approved these changes
Feb 6, 2026
- Fix unused import in ComponentsForm.ts by wiring sanitizeErrorForAnalytics into register/update failure paths (error taxonomy now on all component events) - Prevent spurious server.disconnected on startup by treating first observed disconnected state as initialization (not a transition) - Widen ConnectServerResponse type to include SuccessMessageResponse for local/pro message-only success payloads - Add opportunistic pruning to LSClient errorDedupe map to prevent unbounded growth in long sessions - Add 54 new tests: error taxonomy classification, hash privacy guarantees, server status tracking, disconnect reason correlation, deduplication
- Fix catch (error: any) in LSClient.sendLsClientRequest and
server/cmds disconnectServer to use error: unknown with proper
narrowing via extractErrorMessage(), preventing runtime crashes
on non-Error thrown values
- Add truthiness check to isErrorLikeResponse usage in LSClient
to prevent false-positive analytics on { error: null } responses
- Log globalState.update rejections in ZenExtension and
AnalyticsService to prevent silent first-activation data corruption
- Store event listener references in AnalyticsService for proper
cleanup in dispose(), preventing listener leaks on host reload
- Add console.debug logging to bare catch blocks in track() and
emitErrorOccurred for debuggability
- Remove redundant double-flush in extension deactivation
- Extract shared executeComponentOperation() in ComponentsForm,
eliminating ~50 lines of copy-paste between register and update
- Export extractErrorMessage from analytics.ts for reuse
- Consolidate trackEvent helper into shared export (was duplicated
across 5 command modules)
- Remove dead 'python_backend' from ErrorSource union type
- Fix sha256Hex JSDoc to mention truncation to 16 characters
- Update CLAUDE.md analytics instructions to reference shared import
- Add 7 TypeScript tests for LSClient error dedupe logic
- Add 6 Python tests for connect() args parsing (P0 bug fix coverage)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses all 8 analytics gaps identified in the first analytics report (Jan 12 – Feb 6, 2026). Also fixes a P0 bug where local server connect options (Docker, port) were silently dropped due to an argument-passing mismatch between TypeScript and Python.
urlvariable was uninitialized for local connections but still sent in the args array, causing Python to grab the wrong positional arg. Additionally,getattr()was used on a dict instead of.get(). Docker/port options were always silently ignored.server.connection_failedevent (P1): Dedicated failure event with privacy-safe error taxonomy (errorKind, errorSource, messageHash) — key signal for diagnosing the Windows connection gap (1.8% vs 11.4% connection rate).lastConnectedTypeon connect, use it on disconnect instead of re-categorizing the post-disconnect URL (which reverts to sqlite). AdddisconnectReason(user_initiated vs unexpected) via disconnect intent signal.error.occurredtracking (P1): Emit fromLSClient.sendLsClientRequest()at three points (preflight, request, response) with 60s dedupe window.AnalyticsServiceviaENVIRONMENT_INFO_UPDATEDEventBus pattern.extension.deactivated(P2): GeneratesessionIdon init, include in all events, emit deactivation event with session duration.extension.first_activatedevent via globalState flag,isFirstActivationonextension.activated.component.registered,component.updated,component.deletedevents.New file:
src/utils/analytics.tsPrivacy-safe error classification utility:
ErrorKind/ErrorSource/ErrorPhasetaxonomy typessanitizeErrorForAnalytics()— classifies errors + produces hash, never raw messagesnormalizeForHash()— strips URLs, paths, UUIDs, tokens before hashingisErrorLikeResponse()— type guard for LSP error responsestrackEvent()— shared helper used by all command modulesextractErrorMessage()— safely extracts message from any error typeCode review fixes (second commit)
ComponentsForm.tsnow returnsComponentOperationResultfromregisterComponent()/updateComponent(), includingsanitizeErrorForAnalyticstaxonomy on failure. All three component operations (register, update, delete) now have consistent error tracking.server.disconnectedon startup:handleServerStatusChange()now treats the first observed disconnected state as initialization (sets baseline, doesn't emit). Initial connected state still emitsserver.connected.ConnectServerResponsetype: AddedSuccessMessageResponseto the union for local/pro message-only success payloads (noaccess_token).LSClient.errorDedupenow has opportunistic pruning — stale entries older than 60s are removed at most once per minute to prevent unbounded growth.ErrorKindvariants), hash privacy guarantees (URL/path/UUID stripping), server status tracking (initial state, disconnect classification with 10s window, connection type preservation, deduplication), and utility functions (normalizeForHash,sha256Hex,isErrorLikeResponse).11-agent code review fixes (third commit)
Addressed findings from a comprehensive review by 11 parallel agents (code reviewer, silent failure hunter, comment analyzer, test analyzer, type design analyzer, efficiency reviewer, code quality/reuse/simplification reviewers, and two reviewer personas):
catch (error: any)runtime crash risks: Changed tocatch (error: unknown)with proper narrowing inLSClient.sendLsClientRequestanddisconnectServer. The old code callederror.messagewithout checking iferrorwas an Error instance.isErrorLikeResponsefalse positives: Added truthiness check (&& result.error) to prevent analytics events from{ error: null }responses.globalState.updaterejections (first-activated flag and anonymous ID) — failures were silently swallowed, causingextension.first_activatedto fire on every activation.AnalyticsService.registerEventBus()for proper cleanup indispose(). Previously, anonymous arrow functions were never unsubscribed.trackEventhelper: Extracted shared export inanalytics.ts, replacing 5 identical local copies across command modules.executeComponentOperation(): Eliminated ~50 lines of copy-paste betweenregisterComponentandupdateComponentinComponentsForm.ts.extractErrorMessage(): Shared utility replacing hand-rollederror instanceof Error ? error.message : String(error)pattern.'python_backend'fromErrorSource: No code path ever produced this value.track()andemitErrorOccurred()now log toconsole.debuginstead of silently swallowing.dispose()already flushes.sha256HexJSDoc: Now correctly documents truncation to 16 characters.trackEventimport instead of stale local pattern.Test plan
extension.first_activatedfires only on first install (clear globalState to test)extension.deactivatedfires withsessionDurationMson window closeserver.connection_failedfires with error taxonomy on failed connecterror.occurredfires from LSClient on LSP errors (with dedupe)connectionType(not "unknown") anddisconnectReasonpythonVersionandzenmlVersionappear in event properties after LSP initializationZENML_ANALYTICS_VERBOSE=1 ZENML_ANALYTICS_DEBUG=1to inspect event payloads./scripts/lint.sh— all checks passnpm run compile— builds successfullynpm run test— all 134 tests pass (including 54 new tests)