Skip to content

Controlling context propagation boundary #1633

@pmm-sumo

Description

@pmm-sumo

What are you trying to achieve?

There are several use cases where two organisations employing OpenTelemetry communicate with each other, such as: any 3rd party API calls, synthetics monitoring, webhooks, etc. In some cases (e.g. synthetics monitoring) it is anticipated that the called organisation records the identifier of the originating call and samples the trace (if OpenTelemetry is employed) so it could be later used for diagnosing purposes.

When standard trace context propagation approach is being employed, this leads to several side effects, which might or might not be anticipated:

  • spans recorded on both of them share the same trace ID,
  • each of the organisations has access only to spans recorded on their side; which also means that callee (Organization 2) is not having access to the root span,
  • the originating organisation sampling decision will be passed,
  • baggage might be unwillingly shared between the organisations, which might cause leaking sensitive information

 Organization 1         Organization 2
    (caller)               (callee)
    
 ┌───────────┐          ┌────API────┐
 │           │          │           │
 │ Service A ├────?────►│ Service B │
 │           │          │           │
 └─────┬─────┘          └─────┬─────┘
       │                      │
     Spans                  Spans
       │                      │
 ┌─────▼─────┐          ┌─────▼─────┐
 │ Backend 1 │          │ Backend 2 │
 └───────────┘          └───────────┘

One way to approach it is simple to leave the status quo. I.e. assume that it's fine to deal with the listed side effects and consider it should be a duty of the organisation making the call to external service that the baggage must not contain sensitive information.

However, perhaps the Tracing API/SDK could be extended to handle such case gracefully and e.g. explicitly filter baggage context as well as provide means for controlling how the trace context should be propagated when making a call to an external organisation. There are several approaches how to handle this, to start with some:

  1. Drop trace context in such case and instead identify the caller somehow via baggage. This information could be incorporated on the callee side and persisted, e.g. via storing as a span attribute.
  2. Drop trace context and generate a new trace ID for the callee (essentially, include a new context without parent span ID). That way, the caller would have the information that allows to identify the trace on callee side.
  3. Leverage tracestate (or trace-flags) and include additional information there. Perhaps a special field could describe that the request is passing the boundary which requires starting a new trace and using linking capability to store information about the relationship with the callee.
  4. Define a boundary on the callee side (e.g. via API), so whenever context is being extracted there, instead of continuing the trace, start a new one and store context as a linked span
  5. ...

Additional context.

Perhaps this should followup with OTEPS describing a proposal

Related issues: #1255, #1337, #867, #510

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:apiCross language API specification issuespec:contextRelated to the specification/context directorytriage:accepted:needs-sponsorReady to be implemented, but does not yet have a specification sponsor

    Type

    No type

    Projects

    Status

    Spec - Priority Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions