-
-
Notifications
You must be signed in to change notification settings - Fork 217
Add spec for Dynamic Sampling Context #613
Changes from 17 commits
503933e
3f13024
6cdc569
082ca1b
ec30942
30d630c
0513094
4a7508f
1a27b85
d848543
48f8ab9
67b98b2
c761157
16d5986
35e9137
45b444c
57a8f54
d3ed0bd
e8fca9a
cdb1248
85be8e7
0c983c0
2fc84fd
5facd1e
c2ec13e
cbf805f
ee324c5
700e020
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,167 @@ | ||
| --- | ||
| title: "Dynamic Sampling Context (Experimental)" | ||
| --- | ||
|
|
||
| <Alert level="warning"> | ||
|
|
||
| This page is under active development. | ||
| Specifications are not final and subject to change. | ||
| Anything that sounds fishy probably is - nothing is set in stone. | ||
| Opening PRs to improve this page is therefore highly encouraged! | ||
|
|
||
| </Alert> | ||
|
|
||
| Traces sampling done through the <Link to="/sdk/performance/#tracessamplerate">`tracesSampleRate`</Link> or <Link to="/sdk/performance/#tracessampler">`tracesSampler`</Link> options in the SDKs has quite a few drawbacks for users of Sentry SDKs: | ||
|
|
||
| - Changing the sampling rate involved either redeploying applications (which is problematic in case of applications that are not updated automatically, i.e., mobile apps or physically distributed software) or building complex systems to dynamically fetch a sample rate. | ||
| - Sampling only happened based on a factor of randomness. | ||
|
||
| - Employing sampling rules, for example, based on event parameters, is very complex. | ||
| - While writing rules for singular **transactions** is possible, enforcing them on entire **traces** is infeasible. | ||
|
|
||
| The solution for these problems is **Dynamic Sampling**. | ||
| Dynamic Sampling allows users to configure **sampling rules** directly in the Sentry interface. Important: Sampling rules may be applied to **entire traces** or to a **single transaction**. | ||
|
|
||
| ## High-Level Problem Statement | ||
|
|
||
| ### Ingest | ||
|
|
||
| Implementing Dynamic Sampling comes with challenges, especially on the ingestion side of things. | ||
| For Dynamic Sampling, we want to make sampling decisions for entire traces. | ||
| However, to keep ingestion speedy, Relay only looks at singular transactions in isolation (as opposed to looking at whole traces). | ||
| This means that we need the exact same decision basis for all transactions belonging to a trace. | ||
| In other words, all transactions of a trace need to hold all of the information to make a sampling decision, and that **information needs to be the same across all transactions of the trace**. | ||
| We call the information we base sampling decisions on **"Dynamic Sampling Context"** or **"DSC"**. | ||
| As a mental model: | ||
| The head transaction in a trace determines the Dynamic Sampling Context for all following transactions in that trace. | ||
| No information can be changed, added or deleted after the first propagation. | ||
|
|
||
| ### SDKs | ||
|
|
||
| SDKs are responsible for propagating **Dynamic Sampling Context** across all applications that are part of a trace. | ||
| This involves: | ||
|
|
||
| 1. Collecting the information that makes up the DSC **xor** extracting the DSC from incoming requests. | ||
| 2. Propagating DSC to downstream SDKs. | ||
| 3. Sending the DSC to Sentry via the `trace` envelope header. | ||
|
|
||
| Because there are quite a few things to keep in mind for DSC propagation and to avoid every SDK running into the same problems, we defined a [unified propagation mechanism](#unified-propagation-mechanism) (step-by-step instructions) that all SDK implementations should be able to follow. | ||
|
|
||
| ## Baggage | ||
|
|
||
| We chose `baggage` as the propagation mechanism for DSC. ([w3c baggage spec](https://www.w3.org/TR/baggage/)) | ||
| Baggage is a standard HTTP header with URI encoded key-value pairs. | ||
|
|
||
| For the propagation of DSC, SDKs first read the DSC from the baggage header of incoming requests/messages. | ||
| To propagate DSC to downstream SDKs/services, we create a baggage header (or modify an existing one) through HTTP request instrumentation. | ||
|
|
||
| <Alert level="info"> | ||
|
|
||
| Other vendors might also be using the `baggage` header. | ||
| If a `baggage` header already exists on an outgoing request, SDKs should aim to be good citizens by only **appending** Sentry values to the header. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For incoming
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As Dynamic Sampling Context is immutable across an entire trace, we cannot add additional I added the pseudo algorithm of how SDKs should instrument incoming and outgoing requests in regards to DSC. I hope that clarifies this a bit: 3f13024
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also added some sentences to explain this a little bit more explicitly: ec30942
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let’s also make it very clear that we should not delete or change existing non-sentry baggage values (and this this very very important).
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Does this mean that if an incoming baggage header is close to the limit of 8192 characters we are not allowed to add our sentry list-items to the header? Does that mean we should add them in an all or nothing fashion or do we have a priority for which of them we want to try and add until we run out of characters?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes we need to be good citizens and respect the standard here.
I don't think we have a priority, we try to add what we can. I'd stay away from the all-or-nothing approach since that seems it might be confusing to users (not sure though). I'm comfortable enough with this for now since the vast majority of head SDK's (those creating the head transactions of a trace) will be from browser/mobile, which should not have problems with incoming baggage - only outgoing. As a result, we won't really hit this to start, and when if it ever becomes a substantial problem, I think we can come back and revisit. For now, let's just optimize for getting this out the door.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree - let's try to keep things simple for now. In case this really becomes a problem, we could still revisit this and discuss priorities of keys or other handling strategies. For now, if we exceed the max length, we should log a warning though so that users/we are aware of why DSC might be propagated/sent incomplete in this case. |
||
| In the case that another vendor added Sentry values to an outgoing request, SDKs may overwrite those values. | ||
|
|
||
| SDKs must not add other vendors' baggage from incoming requests to outgoing requests. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So devs have to add incoming baggage headers to outgoing requests themselves. If it's already there we just add our list-items to the outgoing baggage header(s). If no baggage header is present we add one with our list-items if our options tell us to do so, correct?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes correct, we already discussed this quite extensively within the JS SDK team. If users want to propagate baggage in general, they can propagate it themselves (or use libraries specifically for that). Sentry SDKs should only propagate Mental model: Sentry SDKs are not trace-context/baggage propagation libraries - they are Sentry SDKs. If anybody has strong opinions against this however, we can reopen the discussion on this. |
||
| Sentry SDKs only concern themselves with Sentry baggage. | ||
|
|
||
| </Alert> | ||
|
|
||
| The following is an example of what a baggage header containing Dynamic Sampling Context may look like: | ||
|
|
||
| ``` | ||
| baggage: other-vendor-value-1=foo;bar;baz, sentry-traceid=771a43a4192642f0b136d5159a501700, sentry-publickey=49d0f7386ad645858ae85020e393bef3; sentry-userid=Am%C3%A9lie, other-vendor-value-2=foo;bar; | ||
| ``` | ||
|
|
||
| See the [Payloads section](#payloads) for a complete list of key-value pairs that SDKs should propagate. | ||
|
|
||
| ## Payloads | ||
|
|
||
| Dynamic Sampling Context is propagated via a baggage header and sent to Sentry via the `trace` envelope header. | ||
|
|
||
| ### Baggage-Header | ||
|
|
||
| SDKs may set the following key-value pairs on baggage headers: | ||
|
|
||
| - `sentry-traceid` - The original trace ID as generated by the SDK. | ||
| - `sentry-publickey` - Public key as defined by the user via the DSN in the SDK options. | ||
| - `sentry-samplerate` - Sample rate as defined by the user in the SDK options. | ||
| - `sentry-release` - The release as defined by the user in the SDK options. | ||
| - `sentry-environment` - The environment as defined by the user in the SDK options. | ||
| - `sentry-transaction` - The name of the trace's origin transaction in unparameterized (raw) format. | ||
| - `sentry-userid` - User ID as set by the user with <Link to="/sdk/unified-api/#scope">`scope.set_user`</Link>. | ||
| - `sentry-usersegment` - User segment as set by the user with <Link to="/sdk/unified-api/#scope">`scope.set_user`</Link>. | ||
|
|
||
| All of these values are required in a sense, that when they are known to the head SDK at the time of propagation, they must also be propagated. | ||
| In any case, `sentry-traceid`, `sentry-publickey`, `sentry-samplerate` should always be known to the SDK, so these values are strictly required. | ||
|
|
||
| SDKs must set all of the keys in the form of "`sentry-[name]`". | ||
| The prefix "`sentry-`" acts to identify key-value pairs set by Sentry SDKs. | ||
| Additionally, we chose `[name]` to be written in "snake case" without any underscore ( `_` ) characters. This naming convention is the most language agnostic. | ||
|
|
||
| ### Envelope Header | ||
|
|
||
| Dynamic Sampling Context is transferred to Sentry through the `trace` envelope header. | ||
| The value of this header corresponds directly to the definition of <Link to="/sdk/performance/trace-context/#trace-context">Trace Context</Link>. | ||
|
|
||
| When a transaction is reported to Sentry, the Dynamic Sampling Context must be mapped to the `trace` envelope header in the following way: | ||
|
|
||
| - `sentry-traceid` ➝ `trace_id` | ||
|
||
| - `sentry-publickey` ➝ `public_key` | ||
| - `sentry-samplerate` ➝ `sample_rate` | ||
| - `sentry-release` ➝ `release` | ||
| - `sentry-environment` ➝ `environment` | ||
| - `sentry-transaction` ➝ `transaction` | ||
| - `sentry-userid` ➝ `user.id` | ||
| - `sentry-usersegment` ➝ `user.segment` | ||
|
|
||
| ## Unified Propagation Mechanism | ||
|
|
||
| SDKs should follow these steps for any incoming and outgoing requests (in python pseudo-code for illustrative purposes): | ||
|
|
||
| ```python | ||
| def collect_dynamic_sampling_context(): | ||
lforst marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| # Placeholder function that collects as many values for Dynamic Sampling Context | ||
| # as possible and returns a dict | ||
|
|
||
| def has_sentry_value_in_baggage_header(request): | ||
| # Placeholder function that returns True when there is at least one key-value pair in the baggage | ||
| # header of `request`, for which the key starts with "sentry-". Otherwise, it returns False. | ||
|
|
||
| def on_incoming_request(request): | ||
| if has_header(request, "sentry-trace") and (not has_header(request, "baggage") or not has_sentry_value_in_baggage_header(request)): | ||
| # Request comes from an old SDK which doesn't support Dynamic Sampling Context yet | ||
| # --> we don't propagate baggage for this trace | ||
| transaction.baggage_locked = True | ||
| transaction.baggage = {} | ||
| elif has_header(request, "baggage") and has_sentry_value_in_baggage_header(request): | ||
|
||
| transaction.baggage_locked = True | ||
| transaction.baggage = baggage_header_to_dict(request.headers.baggage) | ||
|
|
||
| def on_outgoing_request(request): | ||
| if not transaction.baggage_locked: | ||
| transaction.baggage_locked = True | ||
| if not transaction.baggage: | ||
| transaction.baggage = {} | ||
| transaction.baggage = merge_dicts(collect_dynamic_sampling_context(), transaction.baggage) | ||
|
|
||
| if has_header(request, "baggage"): | ||
| outgoing_baggage_dict = baggage_header_to_dict(request.headers.baggage) | ||
| merged_baggage_dict = merge_dicts(outgoing_baggage_dict, transaction.baggage) | ||
| merged_baggage_header = dict_to_baggage_header(merged_baggage_dict) | ||
| set_header(request, "baggage", merged_baggage_header) | ||
| else: | ||
| baggage_header = dict_to_baggage_header(transaction.baggage) | ||
| set_header(request, "baggage", baggage_header) | ||
| ``` | ||
|
|
||
| While there is no strict necessity for the `transaction.baggage_locked` flag yet, there is a future use case where we need it: | ||
| We might want users to be able to set Dynamic Sampling Context values themselves. | ||
| The flag becomes relevant after the first propagation, where Dynamic Sampling Context becomes immutable. | ||
| When users attempt to set DSC afterwards, our SDKs should make this operation a noop. | ||
|
|
||
| ## Considerations | ||
|
|
||
| Todo: | ||
|
|
||
| - Why baggage and not trace context https://www.w3.org/TR/trace-context/? | ||
|
||
| - Why must baggage be immutable before the second transaction has been started? | ||
| - Why can't we just make the decision for the whole trace in Relay after the trace is complete? | ||
Uh oh!
There was an error while loading. Please reload this page.