Skip to content

Position zero in tracestate identifies the traceparent #80

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
codefromthecrypt opened this issue Mar 12, 2018 · 33 comments
Closed

Position zero in tracestate identifies the traceparent #80

codefromthecrypt opened this issue Mar 12, 2018 · 33 comments
Milestone

Comments

@codefromthecrypt
Copy link

I believe the left-most position of the tracestate should identify the calling trace (the one that wrote traceparent). There were arguments against doing this at one workshop, but it makes a lot of sense if we just make this a requirement. That's why I've done this in #72

In doing so, we remove the ability to split tracestate across multiple header values (as perhaps they might not be put back in order?) I think this is ok. For example, we don't define behavior of multiple traceparent entries, so even special casing multiple tracestate entries is a bit of a strange optimization especially with a length constraint of 512 bytes.

@yurishkuro
Copy link
Member

Can we list the arguments for and against that?

@codefromthecrypt
Copy link
Author

codefromthecrypt commented Mar 13, 2018 via email

@christian-schwarzbauer
Copy link

I will start with arguments for this:

  • when trying to correlate the spans, you can instantly decide if the parent span was created by your tracing system or another system
    • if yes -> you should be able to look it up in your system
    • if not -> in this case you can skip searching in your system or you can ask the other tracing system, if possible
  • when receiving such a tracestate, you can also immediately tell if the parent was created by your system, in case you need different handling of something in your tracing code

IMHO this behavior should make it a lot easier to correlate spans of traces that have been created by multiple tracing systems.

@yurishkuro
Copy link
Member

@christian-schwarzbauer couldn't the same be achieved without position-based approach? E.g.

  • receiver looks for its own vendor key in tracestate and parses span id
  • it then parses traceparent and compares span id
  • only if span IDs match it can infer that the previous hop was traced with the same vendor

@SergeyKanzhelev
Copy link
Member

few disadvantages of positioning are:

  • not every trace library wants to know where it sends telemetry to. Sometimes it will be better to semantically recognize certain tracestate keys and read/update them if it knows what to do with them
  • traceparent was created to optimize identities that standard "recommend" for distributed tracing correlation. So basic interoperability should be on that level without the need to always parse tracestate
  • unnatural requirement for http. Http treats headers with comma-separated values as concatenate-able

@codefromthecrypt
Copy link
Author

codefromthecrypt commented Mar 15, 2018 via email

@nicmunroe
Copy link

Is this specifically about ordering of name/value pairs, or is this also about the name-only optimization that has been discussed (where there's no =[opaqueValue] following the name, which indicates that the parent is that vendor's golden copy)?

If this is just about ordering of name/value pairs, then I'm ok with either way although I have some minor concerns about HTTP libraries not honoring the order you specify (no idea if this is an actual problem in the real world - just a concern that it might be). Assuming that's not an issue we need to worry about then I can see the benefit of knowing who the incoming caller is.

If this is about the name-only optimization then I have some other thoughts, but won't spam them here until I know they're relevant to this issue.

@codefromthecrypt
Copy link
Author

codefromthecrypt commented Mar 15, 2018 via email

@codefromthecrypt
Copy link
Author

codefromthecrypt commented Mar 15, 2018 via email

@codefromthecrypt
Copy link
Author

codefromthecrypt commented Mar 15, 2018 via email

@SergeyKanzhelev
Copy link
Member

Adrian, I'm not talking about opacity. It is more like multiple inheritance problem. Like if there is an Azure service - it may want to participate in multiple ways.

First, it should be a good citizen and make all the necessary updates to trace-parent. Ideally being a generic trace system.

Second, service might want to tell it's identity to the downstream. So downstream will know where to take the span information from.

Third, if it knows about other microsoft tracers upstream that injected hierarchical identity - it will increment it. However, this will ONLY happen if hierarchical identity was started by somebody else. So Default behavior will not inflate the tracestate unnecessary.

Now this service doesn't actually know what's its real identity. Is it generic trace system? Is it Azure? Is it Microsoft hierarchical? So it will not be clear which part to make a leftmost.

@nicmunroe
Copy link

The RFC 2616 HTTP spec section 4.2 specifically answers my question on ordering, and I think makes it clear that if we want to support tracestate as a standard multiple-same-named-header then we need to use commas.

The relevant text in that section regarding commas (emphasis mine):

Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]. It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma.

The relevant text in that section regarding ordering:

The order in which header fields with the same field-name are received is therefore significant to the interpretation of the combined field value, and thus a proxy MUST NOT change the order of these field values when a message is forwarded.

So if we want to be able to support multiple-same-named headers for tracestate (i.e. the multiple headers tracestate: wingtips=foo and tracestate: xray=bar are equivalent to tracestate: wingtips=foo,xray=bar) then the delimiter must be a comma. We could define tracestate such that it is not intended to be used as a multiple-same-named-header and then we'd be free to use whatever delimiter we want, but I think that's going to be more confusing. I'd prefer defining it using comma as the delimiter so that multiple-same-named-header is implicitly and explicitly supported.

With all that context in mind, I think I'm on board with defining the first position as identifying the traceparent. I think there's value in being able to identify the source of traceparent, and this solution seems relatively straightforward. And with the HTTP spec saying that ordering of multiple headers matters and must be preserved my concern about arbitrary/accidental reordering is significantly lessened.

I don't have a strong opinion though and would probably be ok with another solution e.g. @yurishkuro's solution seemed like it would work to allow you to know whether traceparent was yours, although it would be a bit more effort (and if it wasn't yours you wouldn't necessarily be able to determine who sent it).

@yurishkuro
Copy link
Member

@SergeyKanzhelev I think we reached a general agreement that

trace-state = Set(pair(vendor-name, vendor-trace-state))

where vendor-trace-state is a completely opaque blob for any other vendor.

This ticket is about it being a List, rather than a Set, but below holds for both. So when you say "service might want to tell it's identity to the downstream", it's either

  • a specific functionality that exists in MS implementation, in which case it's going to be a part of that opaque blob and is not worth discussing here, or
  • it's something you're proposing as part of the common trace context interface, and I don't think we have any agreement on that aspect (e.g. Dapper style tracers don't do this kind of caller name propagation).

Similar point about hierarchical - I think you're trying to add more semantics to the trace-state than we agreed to. I suggest a different ticket for that.

@nicmunroe good find about same-name headers order being guaranteed. It still doesn't preclude some wonky RPC framework implementation that might expose headers as Map<String, Set<String>> instead of Map<String, List<String>>, but I would be inclined to discount that possibility and always expect a List, which unblocks Adrian's position-0 semantics proposal.

The semi-colon alternative would probably work too, I assume we can stuff up to 4k into a single header (that's the limit on cookie size - https://tools.ietf.org/html/rfc6265#section-6.1), so unless the trace has lots of hops through too many different vendors there's enough room for vendor states. And we could define the behavior that if there isn't enough room the lowest hop can start dropping entries at the tail of the list. My only concern with semi-colon is that it's typically used as a separator between main portion and options for a given segment, e.g. <cookie-name>=<cookie-value>; Domain=<domain-value>. If we use it as segment separator we limit extensibility.

@nicmunroe
Copy link

I share the concern about semi-colon as I've usually seen it used to specify properties/options for a given header, not as a delimiter, so using it as a delimiter could lead to confusion or incorrect assumptions by people who don't carefully read the spec.

Plus anything other than comma means we can't support tracestate being a multi-value header (or if we can then it's going to be more complex). Which means we have to explicitly call that out in the spec, hope people don't use it as a multi-value header anyway, and define what to do when it inevitably happens.

So I'm leaning strongly toward comma as delimiter, and also leaning somewhat towards the core of this issue that position zero identifies the parent. Seems like a reasonable solution and optimization for those who read the spec, and for those who don't they can still determine which one is theirs by explicitly comparing their state against the parent.

@SergeyKanzhelev
Copy link
Member

Similar point about hierarchical - I think you're trying to add more semantics to the trace-state than we agreed to. I suggest a different ticket for that.

No, I'm not trying to argue with what we agreed on. We discussed that any tracing system may know the structure of other tracing systems's tracestate. So it may understand and mutate it if necessary. Single system may also be controlled by multiple tracing systems. Platform exposes the mechanism to access and modify tracestate and subscribers - tracing vendors - mutate each own's state. In example of Azure service - it is kind of controlled by multiple parties. As it participates in generating telemetry for multiple scenarios/graphs.

@yurishkuro
Copy link
Member

We discussed that any tracing system may know the structure of other tracing systems's tracestate. So it may understand and mutate it if necessary.

Hm, ok. Hopefully that doesn't actually require a change in the tracestate header format.

It would be good if you could provide a real world example of the header in this scenario, to make this less abstract.

@SergeyKanzhelev
Copy link
Member

Real world example (not a commitment that everything will be implemented exact this way. We are still hashing out details internally). Take Azure API Management Service. It is basically load balancer on steroids where by setting up policies you can configure secure, standardized access to your APIs. It can combine multiple APIs into single API, translate between formats, etc.

This service may operate in different modes. With the new headers the modes would be:

  • proxy headers if telemetry collection not enabled
  • mutate traceparent as generic tracing provider when telemetry collection enabled
  • mutate traceparent as generic tracing provider & mutate microsoft-specific hierarchical identifier (if any upstream services already has this hierarchical id set in tracestate)

Note, in last case service doesn't want to do hard switch completely to "hierarchical". It is not practical for downstream consumer to see it's behavior flipping depending on caller.

Also telemetry from this service may be consumed by external vendor AND be shown in some Azure experience. First will see it as a generic tracing provider. Second may have extra features available via hierarchical identifiers.

@codefromthecrypt
Copy link
Author

codefromthecrypt commented Mar 15, 2018 via email

@codefromthecrypt
Copy link
Author

codefromthecrypt commented Mar 15, 2018 via email

@codefromthecrypt
Copy link
Author

codefromthecrypt commented Mar 15, 2018 via email

@SergeyKanzhelev
Copy link
Member

Last statement that it's a set of opaque named items - no concerns here. My comment is mostly about the ordering and positioning. I don't think we can make platform decide which tracing vendor it is. For me ideal world is that everybody is just a generic tracing system with the standard mutations. And tracestate has hints for specific vendors. Like cluster-id example from DynaTrace - where to store this span to. And note which span was last seen by "definitely DynaTrace agent".

Not picking on Dynatrace - vendor A may trace 1st and 3rd components. And import data from 2nd. So it will benefit everybody if 2nd component will behave as generic tracer with standard mutations, not hiding it's mutations in 2nd-specific tracestate.

@codefromthecrypt
Copy link
Author

codefromthecrypt commented Mar 15, 2018 via email

@codefromthecrypt
Copy link
Author

codefromthecrypt commented Mar 15, 2018 via email

@codefromthecrypt
Copy link
Author

codefromthecrypt commented Mar 16, 2018 via email

@yurishkuro
Copy link
Member

Maybe the term "vendor" is causing some grief.

some grief, certainly :-) In particular, even if Company X and Y both running Zipkin (same "vendor") it doesn't mean that they want the behavior of Zipkin tracer in the edge service to be the same as in the internal services, i.e. they have the same reasons to mistrust external trace ID as X-Ray/Stackdriver do. I may be way off topic here.

I think we're running into a typical issue of discussing implementation details before agreeing on the requirements, as in the behavior we expect this standard to impose on the participants. For example, do we really need the notion of generic provider? Do we expect some nodes behave like chameleons as in Sergey's example (supporting different vendor tracestate formats)?

@codefromthecrypt
Copy link
Author

@yurishkuro the label was supposed to represent the tracing system, not the brand. We did yank "generic provider", however we minimally need a means to implicate a state entry with a format (which is why generic provider was created in feb). I agree that mappings between things are currently not in scope, however the format characters inside the label don't preclude this.

@codefromthecrypt
Copy link
Author

Ps yuri mentioned in gitter an alternative for knowing which tracestate is the parent.. essentially a special tracestate field.

Ex parent=congo

This would work for randomly sorted fields, and complicate things only a little. It starts feeling like amazon's existing format in this regard as they label a field for parent. Though in their case the standard format prevents them from having double entry concern (ex having multiple labels to identify and define the parent). Position zero eliminates this extra part and has nice side effects like stack.

Anyway I will make some code for existing way and that way if people decide to make it not position zero it will break tests and you can see code impact openzipkin/brave#657

@yurishkuro
Copy link
Member

Reposting from #81 (comment), there are two problems with the dictionary:

  1. how does the current node know that the previous node used the same or different tracer
  2. if the tracestate header grows too long and the current node must drop some of the other vendor entries, how does it choose which ones to drop?

Two solutions to the first issue have been proposed above:

  • parse your own key and compare with traceparent to ensure you were the previous hop, or
  • reserve a special key parent={vendor} that indicates the previous hop.

There is no clean solution to the second issue, only dropping keys randomly.

@SergeyKanzhelev
Copy link
Member

@yurishkuro I think ordering may be a good thing. Specifically for dropping of long headers.

What is the main reason to know your immediate parent's tracer? If the first one doesn't match most implementation will search for the next one that match it. And than extract traceparent in one form or another from it. This way you have more chances to correlate if there is a service mesh was set in-between one day.

Also we may need to support:

  1. property-like tracestate keys. Basically something that helps carries extra information to locate the trace in the telemetry database. Like cluster-id or tenant-id.
  2. scenario I described in Define behavior when library do not understand the version number #54 (comment) where a single component will need to place two different tracestate keys. It is unclear which one should be first there.

That's said I tend to agree that recomendation should be to put your key in the beginning of the tracestate. And tracestate should be ordered.

So if we will put the language of ordering (MUST preserve order of tracestate keys) and recommend to put your key first (SHOULD move modified and added keys to the first position) - will it satisfy scenarios you have in mind?

@yurishkuro
Copy link
Member

Yes, having an ordered set makes more sense semantically, ie it has some tangible benefits.

@SergeyKanzhelev SergeyKanzhelev added this to the experiment milestone Apr 19, 2018
@AloisReitbauer
Copy link
Contributor

As stated in another issue, I don't think we need ordering. The only reason so far is to understand what the parent tracer is. We have the trace-parent header and we should use it for this. This will also make dropping the trace-state header easier. If it gets/ would get too long it is dropped entirely and only trace-parent is set.

@discostu105
Copy link
Member

discostu105 commented Apr 30, 2018

From a practical point of view, ordering tracestate as suggested makes sense:

  • The tracestate string needs to be parsed (split) and then reassembled. At parsing, I need to find and interpret my tracing systems part anyway. The other tracing system parts need to be kept in memory. I end up with: dynatrace_part, part_before_dynatrace, part_after_dynatrace.
  • When reassembling tracestate for an outgoing call, I need to create a new tracestate for our own system. I can then simply append the other tracing system parts:
non_dynatrace_part = part_before_dynatrace + part_after_dynatrace;
tracestate = createNewTraceState(...) + non_dynatrace_part ;

Of course this is vastly oversimplified. What I am trying to say: Putting the last tracing system in front is more efficient to implement. Of course this wouldn't be hard to code in Java (String.split(","), String.join(), ...), but I did it in C++ and tried to avoid allocations as much as possible. In that implementation it was just very natural to re-assemble the string that way.

@SergeyKanzhelev
Copy link
Member

Closed via #164

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants