-
-
Notifications
You must be signed in to change notification settings - Fork 221
tracing: PII/correctness problems with including transaction
in tracestate
#425
Description
This and #424 are a continuation of #410, to make sure points I brought up there don't get lost now that it's closed.
I and various others have raised concerns about including transaction name in data used for dynamic sampling. The two biggest are about PII and data correctness.
As alluded to in the "Freezing the Context" section of the docs about tracestate
headers, transaction name is mutable until a transaction is sent to Sentry, while the contents of the tracestate
header is not. If the transaction name changes after the tracestate
value has been calculated (which only happens if it's about to be used, either in an outgoing HTTP header for the purposes of propagation, or in the envelope header of a transaction being sent to Sentry), the value in the header won't correctly match the transaction's actual data.
This has two consequences:
-
Filtering based on the final name (which is what users see in the UI) won't work. Users will try to filter on
/users/:username/squirrelstats
and nothing will happen, because the transactions'tracestate
values will instead containtransaction: "/users/maisey/squirrelstats"
,transaction: "/users/charlie/squirrelstats"
, and so on and so forth. This seems to defeat the entire purpose of including transaction name to begin with. -
Because in some frameworks (like Express), the transaction name starts out being the specific URL and ends up being the parameterized version, and because we know the revision might not happen until it's too late, we leave ourselves open to the possibility that the transaction name sent out in the
tracestate
might contain PII which would otherwise later be correctly scrubbed by event processors before any event data left the SDK. (Of course, if URLs now count as PII, we have bigger problems, seeing as we send the full path of either the current page or the current request as a matter of practice already. The concern was raised, though, so I'm including it here.)
We've thus far said that at least the first problem is "a risk we're willing to take," but the fact is it's not a risk. It's a known, predictable consequence of the way that at least the Express integration* is built. It would take some investigation to find out of the data we need for the final transaction name is available any earlier, but as it currently stands, any Express route handler which causes an outgoing HTTP request (to the database, say) will have its tracestate
calculated too early and the value will be wrong.
*Quite possibly others as well. There are definitely other ones which change transaction name mid-stream, but it would take some work to figure out when they do that (relative to things which would cause the tracestate
to be calculated and frozen) and whether or not that's fixable if the current timing is too late.
IMHO, if we want this feature to work reliably, we will have to invest resources to audit the SDKs to figure out which ones have this problem (and then of course actually do the work of fixing it). The only other options are to have an only-partially-functional feature, or to drop transaction name from the tracestate
.