-
-
Notifications
You must be signed in to change notification settings - Fork 223
tracing: PII/correctness problems with including transaction
in tracestate
#425
Comments
Good summary, @lobsterkatie! In our current model transaction names are mutable, would be hard to enforce immutability. Perhaps possible if we compute the tracestate only as late as when the transaction envelope is assembled? That would add an extra conceptual responsibility that is at odds with having custom user-provided transports. |
If no HTTP requests have gone out before the transaction is packed into an envelope, this is in fact when |
Right, we most likely propagate before we capture the transaction. |
In all cases I'm familiar with the transaction name would be semi-final before any external span would be created. The typical case I have in mind is as simple as:
There may be others, but this is a VERY common case, and solving this is good enough for me. We can always educate users either via docs or via the UI if behaviors change. I think its probably more important to recognize there will be multiple transaction names in a trace and handling that is likely more of a concern to the end-user. |
It's not clear that it's always possible to have it run in that order, though, and, as far as our Express integration goes, it does not currently run in that order - the transaction name isn't parameterized until the end of the transaction's lifecycle. Further, we generally tell people that our middleware should be outermost, meaning (depending on the framework) routing may or may not have yet happened. Even if it has, the parameterized route name isn't always exposed in a place we can get to it until later in the request/response cycle. Unless that can be solved across the board, we're going to end up having a transaction-name filter which is only compatible with some SDKs/integrations and not others. (And at least right now, Express, which is by far and away the most common node backend, would fall into that "other" category.) |
This and #424 are a continuation of #410, to make sure points I brought up there don't get lost now that it's closed.
I and various others have raised concerns about including transaction name in data used for dynamic sampling. The two biggest are about PII and data correctness.
As alluded to in the "Freezing the Context" section of the docs about
tracestate
headers, transaction name is mutable until a transaction is sent to Sentry, while the contents of thetracestate
header is not. If the transaction name changes after thetracestate
value has been calculated (which only happens if it's about to be used, either in an outgoing HTTP header for the purposes of propagation, or in the envelope header of a transaction being sent to Sentry), the value in the header won't correctly match the transaction's actual data.This has two consequences:
Filtering based on the final name (which is what users see in the UI) won't work. Users will try to filter on
/users/:username/squirrelstats
and nothing will happen, because the transactions'tracestate
values will instead containtransaction: "/users/maisey/squirrelstats"
,transaction: "/users/charlie/squirrelstats"
, and so on and so forth. This seems to defeat the entire purpose of including transaction name to begin with.Because in some frameworks (like Express), the transaction name starts out being the specific URL and ends up being the parameterized version, and because we know the revision might not happen until it's too late, we leave ourselves open to the possibility that the transaction name sent out in the
tracestate
might contain PII which would otherwise later be correctly scrubbed by event processors before any event data left the SDK. (Of course, if URLs now count as PII, we have bigger problems, seeing as we send the full path of either the current page or the current request as a matter of practice already. The concern was raised, though, so I'm including it here.)We've thus far said that at least the first problem is "a risk we're willing to take," but the fact is it's not a risk. It's a known, predictable consequence of the way that at least the Express integration* is built. It would take some investigation to find out of the data we need for the final transaction name is available any earlier, but as it currently stands, any Express route handler which causes an outgoing HTTP request (to the database, say) will have its
tracestate
calculated too early and the value will be wrong.*Quite possibly others as well. There are definitely other ones which change transaction name mid-stream, but it would take some work to figure out when they do that (relative to things which would cause the
tracestate
to be calculated and frozen) and whether or not that's fixable if the current timing is too late.IMHO, if we want this feature to work reliably, we will have to invest resources to audit the SDKs to figure out which ones have this problem (and then of course actually do the work of fixing it). The only other options are to have an only-partially-functional feature, or to drop transaction name from the
tracestate
.The text was updated successfully, but these errors were encountered: