Skip to content
This repository was archived by the owner on Aug 14, 2024. It is now read-only.
This repository was archived by the owner on Aug 14, 2024. It is now read-only.

tracing: PII/correctness problems with including transaction in tracestate #425

@lobsterkatie

Description

@lobsterkatie

This and #424 are a continuation of #410, to make sure points I brought up there don't get lost now that it's closed.

I and various others have raised concerns about including transaction name in data used for dynamic sampling. The two biggest are about PII and data correctness.

As alluded to in the "Freezing the Context" section of the docs about tracestate headers, transaction name is mutable until a transaction is sent to Sentry, while the contents of the tracestate header is not. If the transaction name changes after the tracestate value has been calculated (which only happens if it's about to be used, either in an outgoing HTTP header for the purposes of propagation, or in the envelope header of a transaction being sent to Sentry), the value in the header won't correctly match the transaction's actual data.

This has two consequences:

  1. Filtering based on the final name (which is what users see in the UI) won't work. Users will try to filter on /users/:username/squirrelstats and nothing will happen, because the transactions' tracestate values will instead contain transaction: "/users/maisey/squirrelstats", transaction: "/users/charlie/squirrelstats", and so on and so forth. This seems to defeat the entire purpose of including transaction name to begin with.

  2. Because in some frameworks (like Express), the transaction name starts out being the specific URL and ends up being the parameterized version, and because we know the revision might not happen until it's too late, we leave ourselves open to the possibility that the transaction name sent out in the tracestate might contain PII which would otherwise later be correctly scrubbed by event processors before any event data left the SDK. (Of course, if URLs now count as PII, we have bigger problems, seeing as we send the full path of either the current page or the current request as a matter of practice already. The concern was raised, though, so I'm including it here.)

We've thus far said that at least the first problem is "a risk we're willing to take," but the fact is it's not a risk. It's a known, predictable consequence of the way that at least the Express integration* is built. It would take some investigation to find out of the data we need for the final transaction name is available any earlier, but as it currently stands, any Express route handler which causes an outgoing HTTP request (to the database, say) will have its tracestate calculated too early and the value will be wrong.

*Quite possibly others as well. There are definitely other ones which change transaction name mid-stream, but it would take some work to figure out when they do that (relative to things which would cause the tracestate to be calculated and frozen) and whether or not that's fixable if the current timing is too late.

IMHO, if we want this feature to work reliably, we will have to invest resources to audit the SDKs to figure out which ones have this problem (and then of course actually do the work of fixing it). The only other options are to have an only-partially-functional feature, or to drop transaction name from the tracestate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions