-
Notifications
You must be signed in to change notification settings - Fork 475
fix(llmobs): subsequent context handling with annotations #15764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
Performance SLOsComparing candidate zachg/llmobs_sequential_contexts_fix (c6485ca) with baseline main (7c7d690) 📈 Performance Regressions (3 suites)📈 iastaspects - 118/118✅ add_aspectTime: ✅ 18.046µs (SLO: <20.000µs -9.8%) vs baseline: 📈 +20.4% Memory: ✅ 42.625MB (SLO: <43.250MB 🟡 -1.4%) vs baseline: +5.0% ✅ add_inplace_aspectTime: ✅ 15.010µs (SLO: <20.000µs 📉 -25.0%) vs baseline: +0.7% Memory: ✅ 42.625MB (SLO: <43.250MB 🟡 -1.4%) vs baseline: +4.9% ✅ add_inplace_noaspectTime: ✅ 0.339µs (SLO: <10.000µs 📉 -96.6%) vs baseline: ~same Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.8% ✅ add_noaspectTime: ✅ 0.545µs (SLO: <10.000µs 📉 -94.5%) vs baseline: -0.8% Memory: ✅ 42.605MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ bytearray_aspectTime: ✅ 17.863µs (SLO: <30.000µs 📉 -40.5%) vs baseline: -1.4% Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +4.6% ✅ bytearray_extend_aspectTime: ✅ 23.959µs (SLO: <30.000µs 📉 -20.1%) vs baseline: -0.4% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +4.9% ✅ bytearray_extend_noaspectTime: ✅ 2.741µs (SLO: <10.000µs 📉 -72.6%) vs baseline: +0.1% Memory: ✅ 42.546MB (SLO: <43.500MB -2.2%) vs baseline: +4.8% ✅ bytearray_noaspectTime: ✅ 1.471µs (SLO: <10.000µs 📉 -85.3%) vs baseline: ~same Memory: ✅ 42.664MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +5.0% ✅ bytes_aspectTime: ✅ 16.724µs (SLO: <20.000µs 📉 -16.4%) vs baseline: +0.7% Memory: ✅ 42.664MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +4.9% ✅ bytes_noaspectTime: ✅ 1.406µs (SLO: <10.000µs 📉 -85.9%) vs baseline: -0.8% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +5.1% ✅ bytesio_aspectTime: ✅ 55.743µs (SLO: <70.000µs 📉 -20.4%) vs baseline: -0.2% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +4.7% ✅ bytesio_noaspectTime: ✅ 3.277µs (SLO: <10.000µs 📉 -67.2%) vs baseline: -0.2% Memory: ✅ 42.664MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +5.1% ✅ capitalize_aspectTime: ✅ 14.773µs (SLO: <20.000µs 📉 -26.1%) vs baseline: +1.5% Memory: ✅ 42.605MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ capitalize_noaspectTime: ✅ 2.596µs (SLO: <10.000µs 📉 -74.0%) vs baseline: +0.7% Memory: ✅ 42.605MB (SLO: <43.500MB -2.1%) vs baseline: +5.0% ✅ casefold_aspectTime: ✅ 14.769µs (SLO: <20.000µs 📉 -26.2%) vs baseline: +0.4% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +5.2% ✅ casefold_noaspectTime: ✅ 3.172µs (SLO: <10.000µs 📉 -68.3%) vs baseline: +0.2% Memory: ✅ 42.546MB (SLO: <43.500MB -2.2%) vs baseline: +4.5% ✅ decode_aspectTime: ✅ 15.624µs (SLO: <30.000µs 📉 -47.9%) vs baseline: +0.5% Memory: ✅ 42.546MB (SLO: <43.500MB -2.2%) vs baseline: +4.7% ✅ decode_noaspectTime: ✅ 1.596µs (SLO: <10.000µs 📉 -84.0%) vs baseline: -1.3% Memory: ✅ 42.546MB (SLO: <43.500MB -2.2%) vs baseline: +4.9% ✅ encode_aspectTime: ✅ 18.256µs (SLO: <30.000µs 📉 -39.1%) vs baseline: 📈 +23.2% Memory: ✅ 42.507MB (SLO: <43.500MB -2.3%) vs baseline: +4.7% ✅ encode_noaspectTime: ✅ 1.478µs (SLO: <10.000µs 📉 -85.2%) vs baseline: -3.0% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +4.9% ✅ format_aspectTime: ✅ 170.930µs (SLO: <200.000µs 📉 -14.5%) vs baseline: ~same Memory: ✅ 42.566MB (SLO: <43.250MB 🟡 -1.6%) vs baseline: +4.7% ✅ format_map_aspectTime: ✅ 190.717µs (SLO: <200.000µs -4.6%) vs baseline: ~same Memory: ✅ 42.526MB (SLO: <43.500MB -2.2%) vs baseline: +4.7% ✅ format_map_noaspectTime: ✅ 3.790µs (SLO: <10.000µs 📉 -62.1%) vs baseline: -1.6% Memory: ✅ 42.546MB (SLO: <43.250MB 🟡 -1.6%) vs baseline: +4.8% ✅ format_noaspectTime: ✅ 3.126µs (SLO: <10.000µs 📉 -68.7%) vs baseline: -0.8% Memory: ✅ 42.625MB (SLO: <43.250MB 🟡 -1.4%) vs baseline: +5.0% ✅ index_aspectTime: ✅ 15.333µs (SLO: <20.000µs 📉 -23.3%) vs baseline: -1.0% Memory: ✅ 42.664MB (SLO: <43.250MB 🟡 -1.4%) vs baseline: +5.0% ✅ index_noaspectTime: ✅ 0.461µs (SLO: <10.000µs 📉 -95.4%) vs baseline: -0.3% Memory: ✅ 42.546MB (SLO: <43.500MB -2.2%) vs baseline: +4.8% ✅ join_aspectTime: ✅ 17.118µs (SLO: <20.000µs 📉 -14.4%) vs baseline: -0.5% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +4.9% ✅ join_noaspectTime: ✅ 1.543µs (SLO: <10.000µs 📉 -84.6%) vs baseline: -0.7% Memory: ✅ 42.566MB (SLO: <43.250MB 🟡 -1.6%) vs baseline: +4.9% ✅ ljust_aspectTime: ✅ 20.765µs (SLO: <30.000µs 📉 -30.8%) vs baseline: -0.7% Memory: ✅ 42.684MB (SLO: <43.250MB 🟡 -1.3%) vs baseline: +5.1% ✅ ljust_noaspectTime: ✅ 2.743µs (SLO: <10.000µs 📉 -72.6%) vs baseline: ~same Memory: ✅ 42.605MB (SLO: <43.250MB 🟡 -1.5%) vs baseline: +4.9% ✅ lower_aspectTime: ✅ 18.086µs (SLO: <30.000µs 📉 -39.7%) vs baseline: +0.4% Memory: ✅ 42.703MB (SLO: <43.500MB 🟡 -1.8%) vs baseline: +5.1% ✅ lower_noaspectTime: ✅ 2.428µs (SLO: <10.000µs 📉 -75.7%) vs baseline: -0.1% Memory: ✅ 42.487MB (SLO: <43.250MB 🟡 -1.8%) vs baseline: +4.6% ✅ lstrip_aspectTime: ✅ 17.825µs (SLO: <30.000µs 📉 -40.6%) vs baseline: +0.3% Memory: ✅ 42.605MB (SLO: <43.250MB 🟡 -1.5%) vs baseline: +5.0% ✅ lstrip_noaspectTime: ✅ 1.874µs (SLO: <10.000µs 📉 -81.3%) vs baseline: +0.4% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +5.0% ✅ modulo_aspectTime: ✅ 166.078µs (SLO: <200.000µs 📉 -17.0%) vs baseline: ~same Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +4.7% ✅ modulo_aspect_for_bytearray_bytearrayTime: ✅ 180.110µs (SLO: <200.000µs -9.9%) vs baseline: +3.3% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.7% ✅ modulo_aspect_for_bytesTime: ✅ 168.927µs (SLO: <200.000µs 📉 -15.5%) vs baseline: +0.4% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.8% ✅ modulo_aspect_for_bytes_bytearrayTime: ✅ 172.063µs (SLO: <200.000µs 📉 -14.0%) vs baseline: +0.2% Memory: ✅ 42.723MB (SLO: <43.500MB 🟡 -1.8%) vs baseline: +5.0% ✅ modulo_noaspectTime: ✅ 3.642µs (SLO: <10.000µs 📉 -63.6%) vs baseline: -1.1% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +5.0% ✅ replace_aspectTime: ✅ 212.050µs (SLO: <300.000µs 📉 -29.3%) vs baseline: +0.3% Memory: ✅ 42.762MB (SLO: <44.000MB -2.8%) vs baseline: +5.2% ✅ replace_noaspectTime: ✅ 2.889µs (SLO: <10.000µs 📉 -71.1%) vs baseline: -0.3% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ repr_aspectTime: ✅ 1.416µs (SLO: <10.000µs 📉 -85.8%) vs baseline: ~same Memory: ✅ 42.546MB (SLO: <43.500MB -2.2%) vs baseline: +4.8% ✅ repr_noaspectTime: ✅ 0.523µs (SLO: <10.000µs 📉 -94.8%) vs baseline: -0.4% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ rstrip_aspectTime: ✅ 19.063µs (SLO: <30.000µs 📉 -36.5%) vs baseline: -0.4% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.8% ✅ rstrip_noaspectTime: ✅ 2.049µs (SLO: <10.000µs 📉 -79.5%) vs baseline: +6.4% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +5.0% ✅ slice_aspectTime: ✅ 15.875µs (SLO: <20.000µs 📉 -20.6%) vs baseline: ~same Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +5.0% ✅ slice_noaspectTime: ✅ 0.599µs (SLO: <10.000µs 📉 -94.0%) vs baseline: +0.4% Memory: ✅ 42.546MB (SLO: <43.500MB -2.2%) vs baseline: +4.7% ✅ stringio_aspectTime: ✅ 54.278µs (SLO: <80.000µs 📉 -32.2%) vs baseline: ~same Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ stringio_noaspectTime: ✅ 3.636µs (SLO: <10.000µs 📉 -63.6%) vs baseline: -0.2% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ strip_aspectTime: ✅ 17.684µs (SLO: <20.000µs 📉 -11.6%) vs baseline: -0.9% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +5.1% ✅ strip_noaspectTime: ✅ 1.861µs (SLO: <10.000µs 📉 -81.4%) vs baseline: -0.4% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ swapcase_aspectTime: ✅ 18.473µs (SLO: <30.000µs 📉 -38.4%) vs baseline: -0.8% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.8% ✅ swapcase_noaspectTime: ✅ 2.785µs (SLO: <10.000µs 📉 -72.2%) vs baseline: -1.3% Memory: ✅ 42.605MB (SLO: <43.500MB -2.1%) vs baseline: +4.7% ✅ title_aspectTime: ✅ 18.313µs (SLO: <30.000µs 📉 -39.0%) vs baseline: +0.1% Memory: ✅ 42.585MB (SLO: <43.000MB 🟡 -1.0%) vs baseline: +4.7% ✅ title_noaspectTime: ✅ 2.661µs (SLO: <10.000µs 📉 -73.4%) vs baseline: ~same Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +5.0% ✅ translate_aspectTime: ✅ 24.416µs (SLO: <30.000µs 📉 -18.6%) vs baseline: 📈 +18.2% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +4.9% ✅ translate_noaspectTime: ✅ 4.317µs (SLO: <10.000µs 📉 -56.8%) vs baseline: -0.6% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +5.0% ✅ upper_aspectTime: ✅ 17.928µs (SLO: <30.000µs 📉 -40.2%) vs baseline: -0.9% Memory: ✅ 42.546MB (SLO: <43.500MB -2.2%) vs baseline: +4.8% ✅ upper_noaspectTime: ✅ 2.423µs (SLO: <10.000µs 📉 -75.8%) vs baseline: -1.0% Memory: ✅ 42.664MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +4.9% 📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 5.138µs (SLO: <10.000µs 📉 -48.6%) vs baseline: 📈 +20.9% Memory: ✅ 42.487MB (SLO: <43.500MB -2.3%) vs baseline: +4.5% ✅ ospathbasename_noaspectTime: ✅ 4.358µs (SLO: <10.000µs 📉 -56.4%) vs baseline: +1.4% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +4.9% ✅ ospathjoin_aspectTime: ✅ 6.231µs (SLO: <10.000µs 📉 -37.7%) vs baseline: -0.5% Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ ospathjoin_noaspectTime: ✅ 6.298µs (SLO: <10.000µs 📉 -37.0%) vs baseline: ~same Memory: ✅ 42.664MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +5.1% ✅ ospathnormcase_aspectTime: ✅ 3.578µs (SLO: <10.000µs 📉 -64.2%) vs baseline: -0.5% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +5.1% ✅ ospathnormcase_noaspectTime: ✅ 3.658µs (SLO: <10.000µs 📉 -63.4%) vs baseline: +0.3% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +4.9% ✅ ospathsplit_aspectTime: ✅ 4.919µs (SLO: <10.000µs 📉 -50.8%) vs baseline: ~same Memory: ✅ 42.664MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +5.1% ✅ ospathsplit_noaspectTime: ✅ 5.048µs (SLO: <10.000µs 📉 -49.5%) vs baseline: +0.2% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +5.0% ✅ ospathsplitdrive_aspectTime: ✅ 3.735µs (SLO: <10.000µs 📉 -62.7%) vs baseline: -0.7% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +5.1% ✅ ospathsplitdrive_noaspectTime: ✅ 0.754µs (SLO: <10.000µs 📉 -92.5%) vs baseline: +1.0% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +5.1% ✅ ospathsplitext_aspectTime: ✅ 4.606µs (SLO: <10.000µs 📉 -53.9%) vs baseline: -0.8% Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ ospathsplitext_noaspectTime: ✅ 4.646µs (SLO: <10.000µs 📉 -53.5%) vs baseline: -0.6% Memory: ✅ 42.526MB (SLO: <43.500MB -2.2%) vs baseline: +4.6% 📈 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 3.381µs (SLO: <20.000µs 📉 -83.1%) vs baseline: 📈 +14.3% Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.8% ✅ 1-count-metrics-100-timesTime: ✅ 199.303µs (SLO: <220.000µs -9.4%) vs baseline: -0.3% Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.7% ✅ 1-distribution-metric-1-timesTime: ✅ 3.320µs (SLO: <20.000µs 📉 -83.4%) vs baseline: +0.1% Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.6% ✅ 1-distribution-metrics-100-timesTime: ✅ 213.629µs (SLO: <230.000µs -7.1%) vs baseline: +0.8% Memory: ✅ 34.957MB (SLO: <35.500MB 🟡 -1.5%) vs baseline: +5.2% ✅ 1-gauge-metric-1-timesTime: ✅ 2.217µs (SLO: <20.000µs 📉 -88.9%) vs baseline: +2.3% Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.7% ✅ 1-gauge-metrics-100-timesTime: ✅ 136.128µs (SLO: <150.000µs -9.2%) vs baseline: -0.2% Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +4.6% ✅ 1-rate-metric-1-timesTime: ✅ 3.139µs (SLO: <20.000µs 📉 -84.3%) vs baseline: +0.3% Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.7% ✅ 1-rate-metrics-100-timesTime: ✅ 213.508µs (SLO: <250.000µs 📉 -14.6%) vs baseline: +0.6% Memory: ✅ 34.918MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +4.9% ✅ 100-count-metrics-100-timesTime: ✅ 20.122ms (SLO: <22.000ms -8.5%) vs baseline: +0.4% Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.6% ✅ 100-distribution-metrics-100-timesTime: ✅ 2.250ms (SLO: <2.550ms 📉 -11.8%) vs baseline: -0.3% Memory: ✅ 35.134MB (SLO: <35.500MB 🟡 -1.0%) vs baseline: +4.6% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.410ms (SLO: <1.550ms -9.1%) vs baseline: +0.9% Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.5% ✅ 100-rate-metrics-100-timesTime: ✅ 2.200ms (SLO: <2.550ms 📉 -13.7%) vs baseline: +1.3% Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +4.3% ✅ flush-1-metricTime: ✅ 4.629µs (SLO: <20.000µs 📉 -76.9%) vs baseline: +0.2% Memory: ✅ 35.271MB (SLO: <35.500MB 🟡 -0.6%) vs baseline: +4.7% ✅ flush-100-metricsTime: ✅ 174.764µs (SLO: <250.000µs 📉 -30.1%) vs baseline: ~same Memory: ✅ 35.271MB (SLO: <35.500MB 🟡 -0.6%) vs baseline: +4.9% ✅ flush-1000-metricsTime: ✅ 2.175ms (SLO: <2.500ms 📉 -13.0%) vs baseline: +0.4% Memory: ✅ 36.019MB (SLO: <36.500MB 🟡 -1.3%) vs baseline: +4.7% 🟡 Near SLO Breach (17 suites)🟡 coreapiscenario - 10/10 (1 unstable)
|
PROFeNoM
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ![]()
Description
Fixes customer issue with sequential
annotation_contextblocksWhen using
LLMObs.annotation_context()multiple times sequentially (e.g., with LangChain'swith_structured_output()andbatch()), only the first batch call in the second annotation context would be annotated. Subsequent calls would fail to receive annotations.Root Cause
The previous fix in #15571 addressed annotation context persistence within a single
annotation_contextblock by setting_reactivate=Trueon the Context. However, it didn't handle multiple sequentialannotation_contextblocks.The bug flow was:
annotation_contextcreates a Context withANNOTATIONS_CONTEXT_ID=Xand_reactivate=Trueannotation_contextexits, sets_reactivate=Falsebut the Context remains activeannotation_contextenters, sees the stale Context, and reuses itsANNOTATIONS_CONTEXT_ID=X_reactivate=Falseso Context is NOT reactivatedANNOTATIONS_CONTEXT_ID, so annotations failThe Fix
In
deregister_annotation(), after setting_reactivate=False, we now also deactivate the context we created if it's still the active context. This ensures subsequentannotation_contextblocks start fresh with their own Context.Testing
test_annotation_context_sequential_contexts_work_independentlywhich specifically tests the customer's scenarioRisks
Context is confusing, but all the tests still pass so I think we should be alright.
Additional Notes