feat(ds): Use sample rate in trace context if missing from DSC #4620

mjq · 2025-03-27T00:46:34Z

Our Electron SDK is not populating the sample_rate field in DSC (see getsentry/sentry-electron#1114). As a result, span metric extrapolation does not work as the client's sample rate is unknown.

Although we will also fix this in the SDK, it will take time for clients to upgrade. In the meantime, it would be nice if data sent by existing SDK versions could work too.

In the case of the Electron SDK, the client sample rate is being sent in the transaction event payload, inside the trace context:

{
  "contexts": {
    "trace": {
      "data": {
        "sentry.sample_rate": 0.1
      }
    }
  }
}

This PR checks if we are missing a DSC sample rate (because that field is missing, or the entire DSC is missing), and if so, tries to find a client sample rate in the transaction event payload to use and copies it into the DSC. The first place we check is trace context's documented client_sample_rate field, falling back to the non-standard sentry.sample_rate in trace context's data being used by Electron.

I would like to add an integration test to this PR as well, but I'll get to that tomorrow if the concept and direction seem sound. Thanks!

relay-server/src/services/processor/dynamic_sampling.rs

Dav1dde · 2025-03-27T08:18:01Z

I would like to add an integration test to this PR as well, but I'll get to that tomorrow if the concept and direction seem sound. Thanks!

Yep seems like the right place to do this. Integration test 😙🤌.

More information

Dav1dde · 2025-03-27T15:09:04Z

@jan-auer brought up a good point to me, do we need to make this change, ideally people affected by this can upgrade their SDK version, this change may have unintended side effects in Relay, when the DSC is accessed before running this correction.

mjq · 2025-03-27T15:23:36Z

@Dav1dde Yes, I agree that the ideal is not to touch Relay in this case. I am not sure we are in ideal circumstances though 😄 @jan-auer and others are currently looking at how widespread this issue is - if it was just Electron, that would be one thing, but Cocoa and the latest Python SDKs also appear to be missing the DSC sample rate (but populate it in trace context). We'll see how the investigation shakes out.

mcannizz · 2025-03-27T16:32:08Z

@jan-auer brought up a good point to me, do we need to make this change, ideally people affected by this can upgrade their SDK version, this change may have unintended side effects in Relay, when the DSC is accessed before running this correction.

If we settle on a solution that requires customers to upgrade their SDKs, we have no control over when or if they will upgrade. That creates a risk to our business that we need to avoid at all reasonable costs. This is not a theoretical discussion; we have a product launch that depends on our ability to record accurate sampling rates.

While I understand the possibility of unintended side effects, we can and should mitigate this risk with automated tests.

This is what I recommend:

Make this change and add automated tests now, to unblock our product launch.
Find all the SDKs that don't pass sample_rate as intended and fix them.
Once all or mostly all of our customers have upgraded their SDKs, remove this workaround. (I realize this may take a long time.)

@jan-auer thoughts?

mjq · 2025-03-27T17:03:40Z

I'm going to start working on an integration test for this while the discussion continues. (I don't see a test for making sure the client_sample_rate makes it to Kafka yet, so that's valuable no matter where we land.)

mjq · 2025-03-28T04:40:35Z

Added integration tests.

They showed that the logic I added to try to read the sample rate from trace context client_sample_rate was flawed: client_sample_rate is a field written by Relay, not something clients can send in. So the value always matches the DSC anyway. Removed that logic.

On the plus side, the tests do confirm that the workaround for Electron (reading trace context's data's sentry.sample_rate) does indeed write client_sample_rate into span measurements as required by Snuba/EAP.

On the minus side, this change does not address the issue we found in the Python SDK - in that case the final event JSON is missing sample_rate in the DSC (based on the _dsc field of the event), but does have it in client_sample_rate in the trace context (example). Given what I just said about Relay generating this field, that ought to be impossible. Looked at the Python SDK, Relay and Sentry and can't figure out how an event gets into that state. Would love to hear ideas if anyone has any.

jan-auer · 2025-03-28T10:45:42Z

I'm going to respond to several comments here without quoting the individual points:

@cleptric could you please provide some information on the previous investigation of missing sample rates in the DSC? I know that a couple of months ago we have already investigated all SDKs and started to fix them all. @mcannizz has just requested this again, and we should double-check what we have missed in the past.
Note that we're not dealing with some intentional legacy behavior here, but with custom data that the Electron SDK has been accidentally sending (notice the field is not in the schema). If we move forward with this, we need to:
1. Put protections in place that Relay never reads the sample rate field from the DSC before this normalization has run. The simplest way to assure this is by writing an E2E test.
2. Determine the minimum version where Electron SDK started sending the erroneous field. All versions before that will not have any sample rate at all, so these customers must update in any case (see here).
3. Technically we can monitor SDK usage and eventually remove the workaround. We are very strict with this, however, so I would plan for this workaround to persist indefinitely.
Even if we normalize in Relay, it is still missing from baggage in SDKs. Therefore, propagation of the sample rate between connected SDKs is likely broken. This means that spans/transactions from downstream clients will not have any sample rate available and extrapolation will be broken regardless. There's nothing that can be done about this in Relay if the sample rate is missing entirely.

In combination with other SDK bugs where no sample rate or the wrong sample rate was sent to Relay, this means that for every SDK there is a certain minimum version to allow correct extrapolation. We need to document these versions, and potentially suggest an SDK update in-product. For most SDKs, this has been added a long time ago, but in some this is recent.

We can move forward with this workaround, but it will not solve the larger problem affecting far more users. I strongly suggest we determine all affected SDKs and encourage upgrades.

Dav1dde · 2025-05-20T07:51:03Z

Closing for now, with the reasons above described by Jan.

feat(ds): use sample rate in trace context if missing from dsc

4d30b66

mjq requested a review from a team as a code owner March 27, 2025 00:46

add changelog

c1d3a84

mjq commented Mar 27, 2025

View reviewed changes

relay-server/src/services/processor/dynamic_sampling.rs Show resolved Hide resolved

mjq added 2 commits March 26, 2025 21:05

nit

d519e99

clarify fn name

8797c23

Dav1dde previously approved these changes Mar 27, 2025

View reviewed changes

relay-server/src/services/processor/dynamic_sampling.rs Show resolved Hide resolved

mjq added 2 commits March 28, 2025 00:10

stop reading client_sample_rate, add integration tests

d112957

add back client_sample_rate cases

836d089

more accurate changelog

6807fba

jan-auer changed the title ~~feat(ds): use sample rate in trace context if missing from dsc~~ feat(ds): Use sample rate in trace context if missing from DSC Mar 28, 2025

mjq added 3 commits March 31, 2025 10:52

Merge branch 'master' into mjq-extract-sample-rate

a9c5f3e

snapshot instead of partialeq

c77868e

fix changed expectations after Jan's fix

6d205f4

Dav1dde closed this May 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ds): Use sample rate in trace context if missing from DSC #4620

feat(ds): Use sample rate in trace context if missing from DSC #4620

Uh oh!

mjq commented Mar 27, 2025

Uh oh!

Uh oh!

Uh oh!

Dav1dde commented Mar 27, 2025

Uh oh!

Dav1dde commented Mar 27, 2025

Uh oh!

mjq commented Mar 27, 2025

Uh oh!

mcannizz commented Mar 27, 2025

Uh oh!

mjq commented Mar 27, 2025

Uh oh!

mjq commented Mar 28, 2025

Uh oh!

jan-auer commented Mar 28, 2025 •

edited

Loading

Uh oh!

Dav1dde commented May 20, 2025

Uh oh!

Uh oh!

feat(ds): Use sample rate in trace context if missing from DSC #4620

feat(ds): Use sample rate in trace context if missing from DSC #4620

Uh oh!

Conversation

mjq commented Mar 27, 2025

Uh oh!

Uh oh!

Uh oh!

Dav1dde commented Mar 27, 2025

Uh oh!

Dav1dde commented Mar 27, 2025

Uh oh!

mjq commented Mar 27, 2025

Uh oh!

mcannizz commented Mar 27, 2025

Uh oh!

mjq commented Mar 27, 2025

Uh oh!

mjq commented Mar 28, 2025

Uh oh!

jan-auer commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dav1dde commented May 20, 2025

Uh oh!

Uh oh!

jan-auer commented Mar 28, 2025 •

edited

Loading