-
-
Notifications
You must be signed in to change notification settings - Fork 217
Add spec for Dynamic Sampling Context #613
Changes from 3 commits
503933e
3f13024
6cdc569
082ca1b
ec30942
30d630c
0513094
4a7508f
1a27b85
d848543
48f8ab9
67b98b2
c761157
16d5986
35e9137
45b444c
57a8f54
d3ed0bd
e8fca9a
cdb1248
85be8e7
0c983c0
2fc84fd
5facd1e
c2ec13e
cbf805f
ee324c5
700e020
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,158 @@ | ||
| --- | ||
| title: "Dynamic Sampling Context (Experimental)" | ||
| --- | ||
|
|
||
| <Alert level="warning"> | ||
|
|
||
| This page is under active development. | ||
| Specifications are not final and subject to change. | ||
| Anything that sounds fishy probably is. | ||
|
|
||
| </Alert> | ||
|
|
||
| Until now, traces sampling was only done through a `traces_sample_rate` option in the SDKs. | ||
|
lforst marked this conversation as resolved.
Outdated
|
||
| This has quite a few drawbacks for users of Sentry SDKs: | ||
|
|
||
| - Changing the sampling rate involved either redeploying applications (which is problematic in case of applications that are not updated automatically, i.e., mobile apps or physically distributed software) or building complex systems to dynamically fetch a sampling rate. | ||
|
lforst marked this conversation as resolved.
Outdated
|
||
| - Sampling only happened based on a factor of randomness. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sampling is happening based on head based, probability sampling (in this case, simple random sampling)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you have a suggestion on how to reword this? I am kinda lost.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let me think about this and try to push up a commit later today. |
||
| Employing sampling rules, for example, based on event parameters, is currently very complex. | ||
| While writing rules for singular **transactions** is possible, enforcing them on an entire **trace** is infeasable. | ||
|
|
||
| The solution for these problems is **Dynamic Sampling**. | ||
| Dynamic Sampling allows users to configure **sampling rules** directly in the Sentry interface. Important: Sampling rules may also be applied to **entire traces**. | ||
|
lforst marked this conversation as resolved.
Outdated
|
||
|
|
||
| ## High-Level Problem Statement | ||
|
|
||
| ### Ingest | ||
|
|
||
| Implementing Dynamic Sampling comes with challenges, especially on the ingestion side of things. | ||
| For Dynamic Sampling, we want to make sampling decisions for entire traces. | ||
| However, to keep ingestion speedy, Relay only looks at singular transactions in isolation (as opposed to looking at whole traces). | ||
| This means that we need the exact same decision basis for all transactions belonging to a trace. | ||
| In other words, all transactions of a trace need to hold all of the information to make a sampling decision, and that information needs to be the same across all transactions of the trace. | ||
| We call the information we base sampling decisions on **"Dynamic Sampling Context"** or **"DSC"**. | ||
|
|
||
| ### SDKs | ||
|
|
||
| SDKs are responsible for propagating **Dynamic Sampling Context** across all applications that are part of a trace. | ||
| This involves: | ||
|
|
||
| 1. Collecting the information that makes up the DSC **xor** extracting the DSC from incoming requests. | ||
| 2. Propagating DSC to downstream SDKs. | ||
| 3. Sending the DSC to Sentry via an envelope. | ||
|
|
||
| Because there are quite a few things to keep in mind for DSC propagation and to avoid every SDK running into the same problems, we defined a [unified propagation mechanism](#unified-propagation-mechanism) (step-by-step instructions) that all SDK implementations should be able to follow. | ||
|
|
||
| ## Baggage | ||
|
|
||
| We chose `baggage` as the propagation mechanism for DSC. ([w3c baggage spec](https://www.w3.org/TR/baggage/)) | ||
| Baggage is a standard HTTP header with URI encoded key-value pairs. | ||
|
|
||
| For the propagation of DSC, SDKs first read the DSC from the baggage header of incoming requests/messages. | ||
| To propagate DSC to downstream SDKs/services, we create a baggage header (or modify an existing one) through HTTP request instrumentation. | ||
|
|
||
| <Alert level="info"> | ||
|
|
||
| Other vendors might also be using the `baggage` header. | ||
| If a `baggage` header already exists on an outgoing request, SDKs should aim to be good citizens by only **appending** Sentry values to the header. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For incoming
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As Dynamic Sampling Context is immutable across an entire trace, we cannot add additional I added the pseudo algorithm of how SDKs should instrument incoming and outgoing requests in regards to DSC. I hope that clarifies this a bit: 3f13024
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also added some sentences to explain this a little bit more explicitly: ec30942
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let’s also make it very clear that we should not delete or change existing non-sentry baggage values (and this this very very important).
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Does this mean that if an incoming baggage header is close to the limit of 8192 characters we are not allowed to add our sentry list-items to the header? Does that mean we should add them in an all or nothing fashion or do we have a priority for which of them we want to try and add until we run out of characters?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes we need to be good citizens and respect the standard here.
I don't think we have a priority, we try to add what we can. I'd stay away from the all-or-nothing approach since that seems it might be confusing to users (not sure though). I'm comfortable enough with this for now since the vast majority of head SDK's (those creating the head transactions of a trace) will be from browser/mobile, which should not have problems with incoming baggage - only outgoing. As a result, we won't really hit this to start, and when if it ever becomes a substantial problem, I think we can come back and revisit. For now, let's just optimize for getting this out the door.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree - let's try to keep things simple for now. In case this really becomes a problem, we could still revisit this and discuss priorities of keys or other handling strategies. For now, if we exceed the max length, we should log a warning though so that users/we are aware of why DSC might be propagated/sent incomplete in this case. |
||
| In the case that another vendor added Sentry values to an outgoing request, SDKs may overwrite those values. | ||
|
|
||
| SDKs must not add other vendors' baggage from incoming requests to outgoing requests. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So devs have to add incoming baggage headers to outgoing requests themselves. If it's already there we just add our list-items to the outgoing baggage header(s). If no baggage header is present we add one with our list-items if our options tell us to do so, correct?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes correct, we already discussed this quite extensively within the JS SDK team. If users want to propagate baggage in general, they can propagate it themselves (or use libraries specifically for that). Sentry SDKs should only propagate Mental model: Sentry SDKs are not trace-context/baggage propagation libraries - they are Sentry SDKs. If anybody has strong opinions against this however, we can reopen the discussion on this. |
||
| Sentry SDKs only concern themselves with Sentry baggage. | ||
|
|
||
| </Alert> | ||
|
|
||
| The following is an example of what a baggage header containing Dynamic Sampling Context may look like: | ||
|
|
||
| ``` | ||
| baggage: other-vendor-value-1=foo;bar;baz, sentry-traceid=771a43a4192642f0b136d5159a501700, sentry-publickey=49d0f7386ad645858ae85020e393bef3; sentry-userid=Am%C3%A9lie, other-vendor-value-2=foo;bar; | ||
| ``` | ||
|
|
||
| See the [Payloads section](#payloads) for a complete list of key-value pairs that SDKs should propagate. | ||
|
|
||
| ## Payloads | ||
|
|
||
| Dynamic Sampling Context is propagated via a baggage header and sent to Sentry via <Link to="/sdk/envelopes/#transaction">transaction envelope headers</Link>. | ||
|
|
||
| ### Baggage-Header | ||
|
|
||
| SDKs may set the following key-value pairs on baggage headers. | ||
| While all of these values are optional, SDKs should make their best effort to add as many of them to the baggage header as possible when starting a trace. | ||
|
|
||
| - `sentry-traceid` - The original trace ID as generated by the SDK | ||
| - `sentry-publickey` - Public key as defined by the user via the DSN in the SDK options | ||
| - `sentry-release` - The release as defined by the user in the SDK options | ||
| - `sentry-environment` - The environment as defined by the user in the SDK options | ||
| - `sentry-transaction` - The name of the trace's origin transaction in unparameterized (raw) format | ||
| - `sentry-userid` - User ID as set by the user with <Link to="/sdk/unified-api/#scope">`scope.set_user`</Link> | ||
| - `sentry-usersegment` - User segment as set by the user with <Link to="/sdk/unified-api/#scope">`scope.set_user`</Link> | ||
| - `sentry-samplerate` - Sample rate as defined by the user in the SDK options | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How should the sample rate be formatted? Do we have a max number of decimal points?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question, in the JS SDK, we're currently just calling
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Gave this a little more thought and as far as I can tell, the head SDK should set and propagate/send a format that in the end can be parsed by Relay without problems. I see no reason why downstream SDKs would have to convert a received sample rate string to a number, so we can probably disregard language specific concerns other than Rust.
EDIT: @lforst just notified me that this was already discussed and we agreed on the proposal |
||
|
|
||
| SDKs must set all of the keys in the form of "`sentry-[name]`". | ||
| The delimiter "`sentry-`" acts to identify key-value pairs set by Sentry SDKs. | ||
|
lforst marked this conversation as resolved.
Outdated
|
||
| Additionally, we chose `[name]` to be written in "snake case" without any underscore ( `_` ) characters. This naming convention is the most language agnostic. | ||
|
|
||
| ### Envelope Header | ||
|
|
||
| Dynamic Sampling Context is transferred to Sentry through the <Link to="/sdk/envelopes/#transaction">transaction envelope headers</Link>, keyed by `trace`. | ||
| It corresponds directly to the definition of <Link to="/sdk/performance/trace-context/#trace-context">Trace Context</Link>. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the urgency on renaming TraceContext to Dynamic Sampling Context for SDKs?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Edit: Lukas provided a better answer down below
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was referring to this comment #611 (comment) and comments following it.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry to jump in here but what we proposed was to find a good term for the data we want to both, propagate to downstream SDKs (i.e. via the baggage Http header) and send to Relay (via the The reasons for introducing "Dynamic Sampling Context" (DSC) were already mentioned: Trace Context is an overloaded term, we observed a lot of confusion around terms like "baggage", "trace state", "trace context", etc. and DSC should aim to unify this as much as possible. We're aware that Relay is currently calling the Discussed this with @lforst and we agree on that
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So DSC is what we use to internally to transport tracing information from an incoming request that may exist or the place it where the tracing information is created in the SDK (that's first in line) to the outgoing request (be that an API call or an envelope sent to Sentry). Would it make sense to find a name for the "first in line" SDK to more easily refer to it? e.g. head SDK?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We propose that DSC is a term we use to describe the bag of key-value pairs which are propagated (baggage) and sent to relay (envelope header). So DSC contains:
So essentially, see it as a "meta interface" describing the stuff we propagate/send. Maybe I misunderstood the question but it doesn't really have anything to do with the head transaction. Meaning, an SDK would get DSC via an incoming baggage header (if there exists one/the SDK is not the head SDK)
Yes, very much in favour of "head SDK", "head transaction", etc. (as in "head" of trace) |
||
|
|
||
| When a transaction is reported to Sentry, the Dynamic Sampling Context must be mapped to Trace Context in the following way: | ||
|
Lms24 marked this conversation as resolved.
Outdated
|
||
|
|
||
| - `sentry-release` ➝ `release` | ||
| - `sentry-environment` ➝ `environment` | ||
| - `sentry-transaction` ➝ `transaction` | ||
| - `sentry-userid` ➝ `user.id` | ||
| - `sentry-usersegment` ➝ `user.segment` | ||
| - `sentry-samplerate` ➝ `sample_rate` | ||
| - `sentry-traceid` ➝ `trace_id` | ||
| - `sentry-publickey` ➝ `public_key` | ||
|
|
||
| ## Unified Propagation Mechanism | ||
|
|
||
| SDKs should follow these steps for any incoming and outgoing requests (in python pseudo-code for illustrative purposes): | ||
|
|
||
| ```python | ||
| def collect_dynamic_sampling_context(): | ||
|
lforst marked this conversation as resolved.
|
||
| # Placeholder function that collects as many values for Dynamic Sampling Context | ||
| # as possible and returns a dict | ||
|
|
||
| def on_incoming_request(request): | ||
| if has_header(request, "sentry-trace") and (not has_header(request, "baggage") or not has_sentry_value_in_baggage_header(request)): | ||
| # Request comes from an old SDK which doesn't support Dynamic Sampling Context yet | ||
| # --> we don't propagate baggage for this trace | ||
| transaction.baggage_locked = true | ||
| transaction.baggage = {} | ||
| elif has_header(request, "baggage") and has_sentry_value_in_baggage_header(request): | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How does Is there a specific list of list-items we need to be present for dynamic sampling to work?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Added a pseudo implementation of the function to clarify: 0513094 - We simply check if there is a key in the baggage header that starts with "sentry-" or not.
There is no list yet but tried to clarify in 4a7508f. I believe for DS we need |
||
| transaction.baggage_locked = true | ||
| transaction.baggage = baggage_header_to_dict(request.headers.baggage) | ||
|
|
||
| def on_outgoing_request(request): | ||
| if not transaction.baggage_locked: | ||
| transaction.baggage_locked = true | ||
| if not transaction.baggage: | ||
| transaction.baggage = {} | ||
| transaction.baggage = merge_dicts(collect_dynamic_sampling_context(), transaction.baggage) | ||
|
|
||
| if has_header(request, "baggage"): | ||
| outgoing_baggage_dict = baggage_header_to_dict(request.headers.baggage) | ||
| merged_baggage_dict = merge_dicts(outgoing_baggage_dict, transaction.baggage) | ||
| merged_baggage_header = dict_to_baggage_header(merged_baggage_dict) | ||
| set_header(request, "baggage", merged_baggage_header) | ||
| else: | ||
| baggage_header = dict_to_baggage_header(transaction.baggage) | ||
| set_header(request, "baggage", baggage_header) | ||
| ``` | ||
|
|
||
| While there is no strict necessity for the `transaction.baggage_locked` flag yet, there is a future use case where we need it: | ||
| We might want users to be able to set Dynamic Sampling Context values themselves. | ||
| The flag becomes relevant after the first propagation, where Dynamic Sampling Context becomes immutable. | ||
| When users attempt to set DSC afterwards, our SDKs should make this operation a noop. | ||
|
|
||
| ## Considerations | ||
|
|
||
| Todo: | ||
|
|
||
| - Why baggage and not trace context https://www.w3.org/TR/trace-context/? | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Trace context is more restrictive in terms of size than baggage. Also baggage is more flexible in terms of encoding and characters according to our internal document on the decision. |
||
| - Why must baggage be immutable before the second transaction has been started? | ||
| - Why can't we just make the decision for the whole trace in Relay after the trace is complete? | ||
Uh oh!
There was an error while loading. Please reload this page.