Skip to content
This repository was archived by the owner on Aug 14, 2024. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
503933e
Add spec for Dynamic Sampling Context
Jun 22, 2022
3f13024
Add Unified Propagation Mechanism
Jun 22, 2022
6cdc569
Change `sample_rate` to `traces_sample_rate`
Jun 22, 2022
082ca1b
Change "delimiter" to "prefix"
lforst Jun 22, 2022
ec30942
Clarify that DSC cannot be altered after first propagation
Jun 22, 2022
30d630c
Specify that we're using the `trace` envelope header
Jun 22, 2022
0513094
Clarify functionality of `has_sentry_value_in_baggage_header`
Jun 22, 2022
4a7508f
Clarify what values are needed for Dynamic Sampling to work at all
Jun 22, 2022
1a27b85
Reword traces sampling options
Jun 22, 2022
d848543
Minor rewording
Jun 22, 2022
48f8ab9
Clarify that DS allows users to sample traces AND transactions
lforst Jun 22, 2022
67b98b2
Clarify requiredness of values
Jun 23, 2022
c761157
Minor wording change in code
Jun 23, 2022
16d5986
s/true/True/
Jun 23, 2022
35e9137
Apply wording suggestions
Jun 23, 2022
45b444c
Future-proof introduction
Jun 23, 2022
57a8f54
Open PRs plz
Jun 23, 2022
d3ed0bd
Update considerations section
Jun 23, 2022
e8fca9a
Update payload section to reflect discussion outcome
Jun 23, 2022
cdb1248
Deprecate Trace Contexts
Jun 23, 2022
85be8e7
Clarify format of `sample_rate`
Jun 23, 2022
0c983c0
Remove Trace Context docs
Jun 23, 2022
2fc84fd
add Considerations and Challenges section
AbhiPrasad Jun 23, 2022
5facd1e
remove baggage todo
AbhiPrasad Jun 23, 2022
c2ec13e
Put sentences in separate lines for better diffing
Jun 24, 2022
cbf805f
Stress importance of "freezing" DSC
Jun 24, 2022
ee324c5
Apply suggestions from code review
AbhiPrasad Jun 24, 2022
700e020
Clarify trace behaviour in Temporal Problem
AbhiPrasad Jun 24, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/components/sidebar.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,7 @@ export default () => {
<SidebarLink to="/sdk/performance/" title="Performance">
<SidebarLink to="/sdk/performance/trace-context/">Trace Contexts</SidebarLink>
<SidebarLink to="/sdk/performance/span-operations/">Span Operations</SidebarLink>
<SidebarLink to="/sdk/performance/dynamic-sampling-context/">Dynamic Sampling Context</SidebarLink>
</SidebarLink>
<SidebarLink to="/sdk/event-payloads/" title="Event Payloads">
<SidebarLink to="/sdk/event-payloads/transaction/">
Expand Down
158 changes: 158 additions & 0 deletions src/docs/sdk/performance/dynamic-sampling-context.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
---
title: "Dynamic Sampling Context (Experimental)"
---

<Alert level="warning">

This page is under active development.
Comment thread
lforst marked this conversation as resolved.
Specifications are not final and subject to change.
Anything that sounds fishy probably is.

</Alert>

Until now, traces sampling was only done through a `traces_sample_rate` option in the SDKs.
Comment thread
lforst marked this conversation as resolved.
Outdated
This has quite a few drawbacks for users of Sentry SDKs:

- Changing the sampling rate involved either redeploying applications (which is problematic in case of applications that are not updated automatically, i.e., mobile apps or physically distributed software) or building complex systems to dynamically fetch a sampling rate.
Comment thread
lforst marked this conversation as resolved.
Outdated
- Sampling only happened based on a factor of randomness.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sampling is happening based on head based, probability sampling (in this case, simple random sampling)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a suggestion on how to reword this? I am kinda lost.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me think about this and try to push up a commit later today.

Employing sampling rules, for example, based on event parameters, is currently very complex.
While writing rules for singular **transactions** is possible, enforcing them on an entire **trace** is infeasable.

The solution for these problems is **Dynamic Sampling**.
Dynamic Sampling allows users to configure **sampling rules** directly in the Sentry interface. Important: Sampling rules may also be applied to **entire traces**.
Comment thread
lforst marked this conversation as resolved.
Outdated

## High-Level Problem Statement

### Ingest

Implementing Dynamic Sampling comes with challenges, especially on the ingestion side of things.
For Dynamic Sampling, we want to make sampling decisions for entire traces.
However, to keep ingestion speedy, Relay only looks at singular transactions in isolation (as opposed to looking at whole traces).
This means that we need the exact same decision basis for all transactions belonging to a trace.
In other words, all transactions of a trace need to hold all of the information to make a sampling decision, and that information needs to be the same across all transactions of the trace.
We call the information we base sampling decisions on **"Dynamic Sampling Context"** or **"DSC"**.

### SDKs

SDKs are responsible for propagating **Dynamic Sampling Context** across all applications that are part of a trace.
This involves:

1. Collecting the information that makes up the DSC **xor** extracting the DSC from incoming requests.
2. Propagating DSC to downstream SDKs.
3. Sending the DSC to Sentry via an envelope.

Because there are quite a few things to keep in mind for DSC propagation and to avoid every SDK running into the same problems, we defined a [unified propagation mechanism](#unified-propagation-mechanism) (step-by-step instructions) that all SDK implementations should be able to follow.

## Baggage

We chose `baggage` as the propagation mechanism for DSC. ([w3c baggage spec](https://www.w3.org/TR/baggage/))
Baggage is a standard HTTP header with URI encoded key-value pairs.

For the propagation of DSC, SDKs first read the DSC from the baggage header of incoming requests/messages.
To propagate DSC to downstream SDKs/services, we create a baggage header (or modify an existing one) through HTTP request instrumentation.

<Alert level="info">

Other vendors might also be using the `baggage` header.
If a `baggage` header already exists on an outgoing request, SDKs should aim to be good citizens by only **appending** Sentry values to the header.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For incoming baggage that already contains sentry- list-items:

  • We do not modify existing ones as we want to keep them the way they were sent by the first SDK in line, right?
  • Does that also mean we do not add missing list-items? Assuming userid was not set by the first SDK but another SDK knows it, should it add the list-item to baggage and other places or keep them the way they were sent by the first SDK?

Copy link
Copy Markdown
Contributor Author

@lforst lforst Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Dynamic Sampling Context is immutable across an entire trace, we cannot add additional sentry- list items when there already are ones on an incoming request. The reasoning for immutability is explained in the #ingest section of this doc - essentially relay needs to make exactly the same sampling decision for all individual transactions of a trace. This requires all transactions to have the exact same Dynamic Sampling Context.

I added the pseudo algorithm of how SDKs should instrument incoming and outgoing requests in regards to DSC. I hope that clarifies this a bit: 3f13024

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added some sentences to explain this a little bit more explicitly: ec30942

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s also make it very clear that we should not delete or change existing non-sentry baggage values (and this this very very important).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AbhiPrasad

we should not delete or change existing non-sentry baggage values

Does this mean that if an incoming baggage header is close to the limit of 8192 characters we are not allowed to add our sentry list-items to the header? Does that mean we should add them in an all or nothing fashion or do we have a priority for which of them we want to try and add until we run out of characters?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that if an incoming baggage header is close to the limit of 8192 characters we are not allowed to add our sentry list-items to the header?

Yes we need to be good citizens and respect the standard here.

Does that mean we should add them in an all or nothing fashion or do we have a priority for which of them we want to try and add until we run out of characters?

I don't think we have a priority, we try to add what we can. I'd stay away from the all-or-nothing approach since that seems it might be confusing to users (not sure though). I'm comfortable enough with this for now since the vast majority of head SDK's (those creating the head transactions of a trace) will be from browser/mobile, which should not have problems with incoming baggage - only outgoing. As a result, we won't really hit this to start, and when if it ever becomes a substantial problem, I think we can come back and revisit. For now, let's just optimize for getting this out the door.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have a priority, we try to add what we can

I agree - let's try to keep things simple for now. In case this really becomes a problem, we could still revisit this and discuss priorities of keys or other handling strategies.

For now, if we exceed the max length, we should log a warning though so that users/we are aware of why DSC might be propagated/sent incomplete in this case.
(Which is what we're currently doing in the JS SDKs).

In the case that another vendor added Sentry values to an outgoing request, SDKs may overwrite those values.

SDKs must not add other vendors' baggage from incoming requests to outgoing requests.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So devs have to add incoming baggage headers to outgoing requests themselves. If it's already there we just add our list-items to the outgoing baggage header(s). If no baggage header is present we add one with our list-items if our options tell us to do so, correct?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes correct, we already discussed this quite extensively within the JS SDK team. If users want to propagate baggage in general, they can propagate it themselves (or use libraries specifically for that). Sentry SDKs should only propagate sentry-* entries in the baggage header.

Mental model: Sentry SDKs are not trace-context/baggage propagation libraries - they are Sentry SDKs.

If anybody has strong opinions against this however, we can reopen the discussion on this.

Sentry SDKs only concern themselves with Sentry baggage.

</Alert>

The following is an example of what a baggage header containing Dynamic Sampling Context may look like:

```
baggage: other-vendor-value-1=foo;bar;baz, sentry-traceid=771a43a4192642f0b136d5159a501700, sentry-publickey=49d0f7386ad645858ae85020e393bef3; sentry-userid=Am%C3%A9lie, other-vendor-value-2=foo;bar;
```

See the [Payloads section](#payloads) for a complete list of key-value pairs that SDKs should propagate.

## Payloads

Dynamic Sampling Context is propagated via a baggage header and sent to Sentry via <Link to="/sdk/envelopes/#transaction">transaction envelope headers</Link>.

### Baggage-Header

SDKs may set the following key-value pairs on baggage headers.
While all of these values are optional, SDKs should make their best effort to add as many of them to the baggage header as possible when starting a trace.

- `sentry-traceid` - The original trace ID as generated by the SDK
- `sentry-publickey` - Public key as defined by the user via the DSN in the SDK options
- `sentry-release` - The release as defined by the user in the SDK options
- `sentry-environment` - The environment as defined by the user in the SDK options
- `sentry-transaction` - The name of the trace's origin transaction in unparameterized (raw) format
- `sentry-userid` - User ID as set by the user with <Link to="/sdk/unified-api/#scope">`scope.set_user`</Link>
- `sentry-usersegment` - User segment as set by the user with <Link to="/sdk/unified-api/#scope">`scope.set_user`</Link>
- `sentry-samplerate` - Sample rate as defined by the user in the SDK options
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should the sample rate be formatted? Do we have a max number of decimal points?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, in the JS SDK, we're currently just calling toString on the sample rate number which caps it per default at 16 decimal points. Happy to change it if we decide to handle this uniformly across SDKs

Copy link
Copy Markdown
Member

@Lms24 Lms24 Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave this a little more thought and as far as I can tell, the head SDK should set and propagate/send a format that in the end can be parsed by Relay without problems. I see no reason why downstream SDKs would have to convert a received sample rate string to a number, so we can probably disregard language specific concerns other than Rust.

Which brings me to the question what the constrains on the Relay side are (@jjbayer, any thoughts?). Should we e.g. agree on only sending/propagating the sample rate in "simple" decimal notation (i.e. no e.g. exponential notation such as 1.45e10-14) as proposed by @lforst ?

EDIT: @lforst just notified me that this was already discussed and we agreed on the proposal


SDKs must set all of the keys in the form of "`sentry-[name]`".
The delimiter "`sentry-`" acts to identify key-value pairs set by Sentry SDKs.
Comment thread
lforst marked this conversation as resolved.
Outdated
Additionally, we chose `[name]` to be written in "snake case" without any underscore ( `_` ) characters. This naming convention is the most language agnostic.

### Envelope Header

Dynamic Sampling Context is transferred to Sentry through the <Link to="/sdk/envelopes/#transaction">transaction envelope headers</Link>, keyed by `trace`.
It corresponds directly to the definition of <Link to="/sdk/performance/trace-context/#trace-context">Trace Context</Link>.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the urgency on renaming TraceContext to Dynamic Sampling Context for SDKs?

Copy link
Copy Markdown
Contributor Author

@lforst lforst Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those are still more or less different concepts. The word Dynamic Sampling Context represents the sentry-* values we propagate in the baggage header. The term TraceContext is incredibly overloaded - in the docs I just wrote it simply refers to the object schema over at https://develop.sentry.dev/sdk/performance/trace-context/#trace-context

I know in SDKs there are some fields called for exampleevent.contexts.trace but those fields are something else again. Right now we only care about Dynamic Sampling Context propagation in baggage and TraceContext in the event envelope header.

Edit: Lukas provided a better answer down below

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to this comment #611 (comment) and comments following it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to jump in here but what we proposed was to find a good term for the data we want to both, propagate to downstream SDKs (i.e. via the baggage Http header) and send to Relay (via the trace envelope header). That data should be the same, however, it is structured differently (as pointed out in the docs added by this PR).

The reasons for introducing "Dynamic Sampling Context" (DSC) were already mentioned: Trace Context is an overloaded term, we observed a lot of confusion around terms like "baggage", "trace state", "trace context", etc. and DSC should aim to unify this as much as possible. We're aware that Relay is currently calling the trace envelope header TraceContext. IMO it's not as important to change this right now but to rather have a term we can all agree on when talking about what we actually propagate/send.

Discussed this with @lforst and we agree on that

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So DSC is what we use to internally to transport tracing information from an incoming request that may exist or the place it where the tracing information is created in the SDK (that's first in line) to the outgoing request (be that an API call or an envelope sent to Sentry).

Would it make sense to find a name for the "first in line" SDK to more easily refer to it? e.g. head SDK?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We propose that DSC is a term we use to describe the bag of key-value pairs which are propagated (baggage) and sent to relay (envelope header). So DSC contains:

  • the three "internal" items (trace id, public key, sample rate)
  • the five "external" items (environment, release, transaction name, user id, user segment)

So essentially, see it as a "meta interface" describing the stuff we propagate/send.

Maybe I misunderstood the question but it doesn't really have anything to do with the head transaction. Meaning, an SDK would get DSC via an incoming baggage header (if there exists one/the SDK is not the head SDK)

Would it make sense to find a name for the "first in line" SDK to more easily refer to it? e.g. head SDK?

Yes, very much in favour of "head SDK", "head transaction", etc. (as in "head" of trace)


When a transaction is reported to Sentry, the Dynamic Sampling Context must be mapped to Trace Context in the following way:
Comment thread
Lms24 marked this conversation as resolved.
Outdated

- `sentry-release` ➝ `release`
- `sentry-environment` ➝ `environment`
- `sentry-transaction` ➝ `transaction`
- `sentry-userid` ➝ `user.id`
- `sentry-usersegment` ➝ `user.segment`
- `sentry-samplerate` ➝ `sample_rate`
- `sentry-traceid` ➝ `trace_id`
- `sentry-publickey` ➝ `public_key`

## Unified Propagation Mechanism

SDKs should follow these steps for any incoming and outgoing requests (in python pseudo-code for illustrative purposes):

```python
def collect_dynamic_sampling_context():
Comment thread
lforst marked this conversation as resolved.
# Placeholder function that collects as many values for Dynamic Sampling Context
# as possible and returns a dict

def on_incoming_request(request):
if has_header(request, "sentry-trace") and (not has_header(request, "baggage") or not has_sentry_value_in_baggage_header(request)):
# Request comes from an old SDK which doesn't support Dynamic Sampling Context yet
# --> we don't propagate baggage for this trace
transaction.baggage_locked = true
transaction.baggage = {}
elif has_header(request, "baggage") and has_sentry_value_in_baggage_header(request):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does has_sentry_value_in_baggage_header work?

Is there a specific list of list-items we need to be present for dynamic sampling to work?
I assume samplerate and publickey are required, are other fields as well?
Some of them are optional.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does has_sentry_value_in_baggage_header work?

Added a pseudo implementation of the function to clarify: 0513094 - We simply check if there is a key in the baggage header that starts with "sentry-" or not.

Is there a specific list of list-items we need to be present for dynamic sampling to work?

There is no list yet but tried to clarify in 4a7508f. I believe for DS we need samplerate, publickey and traceid. However I should note that right here we don't care at all about what values are actually needed. If something is missing in an incoming request, we propagate DSC as is, and don't try to add stuff. I again wanna put emphasis on the fact that DSC cannot be mutated (by the origin application or any application down the line) as soon it has been propagated.

transaction.baggage_locked = true
transaction.baggage = baggage_header_to_dict(request.headers.baggage)

def on_outgoing_request(request):
if not transaction.baggage_locked:
transaction.baggage_locked = true
if not transaction.baggage:
transaction.baggage = {}
transaction.baggage = merge_dicts(collect_dynamic_sampling_context(), transaction.baggage)

if has_header(request, "baggage"):
outgoing_baggage_dict = baggage_header_to_dict(request.headers.baggage)
merged_baggage_dict = merge_dicts(outgoing_baggage_dict, transaction.baggage)
merged_baggage_header = dict_to_baggage_header(merged_baggage_dict)
set_header(request, "baggage", merged_baggage_header)
else:
baggage_header = dict_to_baggage_header(transaction.baggage)
set_header(request, "baggage", baggage_header)
```

While there is no strict necessity for the `transaction.baggage_locked` flag yet, there is a future use case where we need it:
We might want users to be able to set Dynamic Sampling Context values themselves.
The flag becomes relevant after the first propagation, where Dynamic Sampling Context becomes immutable.
When users attempt to set DSC afterwards, our SDKs should make this operation a noop.

## Considerations

Todo:

- Why baggage and not trace context https://www.w3.org/TR/trace-context/?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trace context is more restrictive in terms of size than baggage. Also baggage is more flexible in terms of encoding and characters according to our internal document on the decision.

- Why must baggage be immutable before the second transaction has been started?
- Why can't we just make the decision for the whole trace in Relay after the trace is complete?
1 change: 1 addition & 0 deletions src/docs/sdk/performance/trace-context.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Regardless of the transport mechanism, the trace context is a JSON object with t

- `trace_id` (string, required) - UUID V4 encoded as a hexadecimal sequence with no dashes (e.g. `771a43a4192642f0b136d5159a501700`) that is a sequence of 32 hexadecimal digits. This must match the trace id of the submitted transaction item.
- `public_key` (string, required) - Public key from the DSN used by the SDK. It allows Sentry to sample traces spanning multiple projects, by resolving the same set of rules based on the starting project.
- `sample_rate` (number, optional) - The sample rate as defined by the user on the SDK where the trace originated.
- `release` (string, optional) - The release name as specified in client options, usually: `package@x.y.z+build`. _This should match the `release` attribute of the transaction event payload_.\*
- `environment` - The environment name as specified in client options, for example `staging`. _This should match the `environment` attribute of the transaction event payload_.\*
- `user` (object, optional) - A subset of the scope's user context containing the following fields:
Expand Down