Suggest Prefix encoding instead of a version number #15

vancem · 2017-09-20T21:13:23Z

Bogdan suggested creating a pull request for feedback on the spec. I wanted to confine myself to only those things we NEED to agree on. The first version Bogdan only described the two level ID format (presumably we would have another version number for the hierarchical format. However this 'version at the top level' approach has the problem that we don't capture the fact that both formats agree on the TraceID and having a format that allowed the Trace-id to be parsed before 'diverging', would be useful.

My counter proposal is what I can a 'prefix encoding. Instead of having a version number up front, each component of the ID has a unique prefix (effectively format code) that defines the meaning and syntax for that part. This allows us to define both the Span-id component (used by Bogdan) and the hierarchical-id (used by Microsoft), while BOTH share the definition of Trace-id (and in fact the flags, but that is not as important).

This scheme also makes for a very nice versioning story. We can add new components WITHOUT BREAKING existing parsers (they ignore the extra information). This also allows a transition where you might CHANGE a components meaning by first providing BOTH (so it works with both old and new), and then when all relevant parsers have been updated, you can send just the new format.

The format also happens to be a bit shorter.

Note that I happen to pick a set of characters for the prefixes, but I don't really care too much.

Note that the result is that from Bogdan's point of viewpoint, instead of

00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01

he generates

*4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7~

Which is pretty much the same. There is no requirement that he support anything else, but it would be nice if he saw

*4bf92f3577b34da6a3ce929d0e0e4736.342.23.ad.34~

His parser would at least recognize the TraceId before bailing

possible variations

In the proposal I made the flags a single bit that we have defined (the ~ prefix), but we can easily define true flags (e.g. we could define that the | prefix is followed by two hexadecimal digits which are flags where we define the individual bits). I did not do this because it seems likely that we will have very few flags, and so we can simply define prefix characters for each of them.

Note that if we feel like we are running out of punctuation to use as prefix characters, we can always create two character prefixes.

Finally if desired we could eliminate the leading *, it just means that the trace-id is REQUIRED (we can never replace or change it).

quick link

The diff is not really very useful, you may wish to simply read the new spec at https://github.com/vancem/tracecontext-spec/blob/master/HTTP_HEADER_FORMAT.md.

@bogdandrutu @lmolkova @jacpull @adriancole @SergeyKanzhelev

bogdandrutu · 2017-09-20T22:22:52Z

Let me start with some high-level feedback about the design:

Overall looks very good.
I would like to keep 1-byte for the trace options / trace suggestion and define maybe few more bytes. Some vendors use 2-3 properties. One of the useful property is "deferred sampling" which can be used for example in a LB to ask the service to maybe send back the sampling decision for this trace. I know this will force us to define a response header, etc., but I would like to have that option to control more bits in the options.
We should have some "default actions" defined when things are missing now that we have everything optional.
Is more like a question. Do we want to have trace-id required? I think it is by far the ID that we all want to use.

There are minor other minor things but I will address them after we fix the high-level things.

codefromthecrypt · 2017-09-21T14:08:06Z

What happens when one cant read or understand one of the encodings, yet needs to trace downstream? This seems to be one of the problems of trying to do switch statements in header encoding.. wouldn't the next node clobber the incoming data? Or are you expecting to send multiple trace-context header values in this case?

codefromthecrypt · 2017-09-21T14:10:36Z

For example, if someone sends an encoding one cant understand in a separate header, you can still pass it unaffected. When the switch is inside the same header the next node needs to mutate, we have a conflict which we would have to reason with.

jacpull · 2017-09-21T16:45:03Z

HTTP_HEADER_FORMAT.md

+### Examples of HTTP headers

-*Valid not-sampled Trace-Context:*
+Note that not all the allowed value prefixes are expected to be used simultaneously.  __In particular it is expected that only one of the Span-id or Hierarchical-id will be used.__


While generally this is true, we will use the span-id in conjunction with the hierarchical-id when we reset the hierarchical-id, to handle bad callers. When reset, a new span-id is generated and the hierarchical-id could be generated with subsequent extensions and increments. So would suggest removing the parts in bold or updating the text suitably.

jacpull · 2017-09-21T16:56:17Z

HTTP_HEADER_FORMAT.md

    to down sample.

-The behavior of other bits is currently undefined.
+### Prefix '.' The Hierarchical-id


This is minor issue. What is unique about the Hierarchical-id is that it is a vector of logical clocks, i.e. unlike the other identifiers, its values have a sort. So consider renaming this to be just Vector or something similar. Yes, Hierarchical-Ids also work, but the structure generated by Span-Ids is also hierarchical, and does not quite reflect its uniqueness.

I am OK with that. We should do as a follow-on PR.

jacpull · 2017-09-21T17:12:44Z

@adriancole

When encountering an unrecognized encoding, simply ignore it and proceed with parsing the remnant and leave as is for downstream partners. This would be analogous to encountering an unknown header and simply passing it forward. And unlike the use of a separate header, using the prefix design on the same header helps simplify the propagation, especially if transport needs to be explicitly handled by non-infrastructure code.

vancem · 2017-09-21T17:13:32Z

@bogdandrutu

I would like to keep 1-byte for the trace options / trace suggestion and define maybe few more bytes. Some vendors use 2-3 properties. One of the useful property is "deferred sampling" which can be used for example in a LB to ask the service to maybe send back the sampling decision for this trace. I know this will force us to define a response header, etc., but I would like to have that option to control more bits in the options.

I think this is fine. have ~ (or some other character, be a prefix that is followed by a hexadecimal digit (or two, if you real thing more than 4 will be defined that will be common), is fine by me.

Is more like a question. Do we want to have trace-id required? I think it is by far the ID that we all want to use.

I am OK with this as well (as mentioned, it just means we can't play with compressed formats later, but I think that is probably OK). If we do that we can remove the * prefix. If we don't remove the prefix the difference between saying its requires and not pretty much disappears (if it is not there you pretty much have to ignore everything, which is what you would do regardless of whether we said it was required or not).

We should have some "default actions" defined when things are missing now that we have everything optional.

Yes, however the main default action is that you can always simply give up (ignore everything completely), and you are encouraged to at least parse what you can and use that (thus if you find the trace-id you can do so)

It sounds like we are close enough that what we should do is accept this PR, and then make the deltas from there (although if you want I can take a wack at incorporating your feedback, but I suspect you know better than I exactly what you want).

vancem · 2017-09-21T17:22:50Z

@adriancole - I think your scenarios are definitely interesting. What I would like to do however, is to put the analysis of these kinds of things in a separate section because it otherwise makes the spec more complicated than it is. Ultimately we need to insure that any things parsers have to do are put back into this first section, but the rationale /scenario analysis can be later (for those who care).

Note that we do have a problem with respect to 'pass through', because right now new prefixes can define ANY mechanism for deciding when the data associated with the prefix ends. To make it so you can 'pass along' we need to have some sort of uniform 'stop' mechanism (for example defining the set of allowable prefixes (probably we just pick a set of punctuation that is big enough), and indicate that data sections CANNOT use those characters in their data (they would have to escape them somehow). That allows parsers to skip unknown prefixes by simply scanning for the next allowed prefix character.

I think this is a good idea.

My recommendation, however is that if we are happen enough with the general prefix architecture, we should merge that, and then make more PRs for suggestions from there.

codefromthecrypt · 2017-09-22T05:00:37Z

I dont think we can punt this part honestly. Before there was one encoding and no chance of a collision. This change tunnels a completely different ID scheme into the same header. If it were a different header we could punt. I would expect if we merged this that the next hop would actually restart the trace as it cant reasonably opt out of tracing just because someone sent a hierarchical header. Iotw, this adds significant debt which is worth avoiding. It isnt simply interesting, this would happen in all tracing except yours, right?

SergeyKanzhelev · 2017-09-22T15:47:57Z

@adriancole hierarchical postfix is a good way for proxy to indicate retries. Proxy can add .1, .2, etc. instead of generating new span identifiers. Which I expect can be quite a common use case.

W.r.t. loggers implementing it - generally loggers that send data to separate storages do not "trust" each other completely. However still wants to provide a link to the incoming request identity. So one may decide to put all unrecognized characters into the property bag associated with the span. This may be a suggestion we can put into the additional section on how to handle such cases later as @vancem suggests.

vancem · 2017-09-22T16:47:44Z

I dont think we can punt this part honestly

@adriancole I am not suggesting we punt it. I am just suggesting a structure for the document.

If it were a different header we could punt

While I am not saying we should punt, I do think that punting is an option (in the short term). Implementations are free to think of any the hierarchical data as being in a separate header and completely ignore it. That has interoperability ramifications, so ideally we give guidance, but it certainly is an option.

I view this specification as best done in stages. The first stage is to define some basic format with enough flexibility that all logging systems can

Express their information so that they can make a homogeneous system work.
Standardize at least the parts that are likely to be common (in our case the Trace-id) and define its semantics.

This allows a certain amount of interoperability. It is unclear that this level of interoperability is useful, but it seems like it might be (at least you can gather together all information for a given trace, which is a HUGE step forward).

We should nail down this stage one first. At the least, this allows us to both implement homogeneous .solutions in the short term with the POTENTIAL for interoperability.

We can also of course start thinking about stage two (which is what your questions are about), and writing guidance (in this doc) for that. My expectation is that mostly this additional analysis/guidance will NOT effect format (it is mostly about rules of how to interpret and update the ID), but even if it does affect format, the format is sufficiently flexible/versionable that it should not be a problem.

I just want progress. Stage two is much less defined than stage one. Lets get the merge done for stage one and get it stable as we can get it, (we can also work on the other stage concurrently, I just don't want it to block stabilizing the stage one spec (which I think we are close to agreement on).

vancem · 2017-09-22T16:49:03Z

@bogdandrutu - how do you want to handle this? I suggested merging this PR and modifying it, but I am also happy to make modifications first. What do you want done?

yurishkuro · 2017-09-22T17:03:13Z

HTTP_HEADER_FORMAT.md

+```
+that was caused by a request with ID
+```
+*4bf92f3577b34da6a3ce929d0e0e4736.3.b


requestID == spanID, is it not? Instead it looks like the hierarchical parts are appended to the traceID, which should be stable across all hops

I am not sure what your question is. However in its simplest use you would only have a Trace-ID value and a hierarchical-ID value in in a system that employs hierarchical IDs.

What may be causing confusion here is that we are not trying to describe a single system. but a framework by which DIFFERENT systems can coexists to the extent possible. Thus some systems might use Trace-id Span-id pairs to represent a request ID and other systems will use a Trace-id - hierarchical-id to do so.

It is still an open question how much interoperability is feasible in a system compost of both types of logging conventions, but a assumption of this specification is that making it possible to share the concept of TraceId, is huge step forward and is likely to be required for any interesting interoperability.

Instead it looks like the hierarchical parts are appended to the traceID, which should be stable across all hops

Yes the hierarchical-id , combined with the trace-id and makes a particular request unique.

My understanding of Microsoft spec that mentioned hierarchical-id was that it still had a stable "trace id" (that was suggested to be passed in the "baggage" header), which was different from unique ID assigned to each request (span-id). Even if it's not correct and the hierarchical id contains the stable part, why can't that stable part be separated into a "trace-id" field understood by this spec?

My understanding of Microsoft spec that mentioned hierarchical-id was that it still had a stable "trace id"

No, in this spec, hierarchical-id does not include Trace-id, but it expected to be used in conjunction with it. Thus the Trace-Context value

*4bf92f3577b34da6a3ce929d0e0e4736.3.b

Has a trace ID in it 4bf92f3577b34da6a3ce929d0e0e4736 (which is table for everything caused by this), and 3.b which is the hierarchical-id (which indicates a particular request within the scope of the Trace-id.

Now of course the full request ID in Microsoft's hierarchical system would include the Trace-id, but in this spec a least 'hierarchical-id' refers only to the part that excluded the Trace-id.

great, so why not *4bf92f3577b34da6a3ce929d0e0e4736-3.b, thus clearly separating the stable portion into the trace-id field? This way the receiving instrumentation doesn't need to even understand hierarchical IDs as long as it can record 3.b as correlation and is able to reuse 4bf92f3577b34da6a3ce929d0e0e4736 as the trace id.

great, so why not *4bf92f3577b34da6a3ce929d0e0e4736-3.b

Because the - prefix is already used to represent a 16 character span id. (the whole idea behind the prefixes, is to allow you to 'mix and match' parts of the value in this Header field.

yurishkuro · 2017-09-22T17:04:04Z

HTTP_HEADER_FORMAT.md


-Is the ID of the caller span (parent). It is represented as a 8-bytes array,
-e.g., `00f067aa0ba902b7`. All bytes 0 is considered invalid.
+The '-' prefix indicates that the next 16 characters are hexadecimal digits (lower case)


if this - happens in the middle of the string, it's not a prefix

When I say prefix, I mean for the value (chunk). Thus the 'Trace-Context' is logically a list of chunks, each value (chunk) has a prefix character that define the format and meaning of that value (chunk).

yurishkuro · 2017-09-22T17:16:38Z

HTTP_HEADER_FORMAT.md

-### Version = 0
+The '*' character indicates that the next 32 characters are hexadecimal digits (lower case)
+that represent a 16 byte identifier.  This identifier is meant to uniquely identify 
+new 'trace'.   The intent is that all requests that are caused directly or 


if traceID is always the first element in the header string, what's the role of the prefix? If it specifically indicates "hexadecimal string", then the current Spec doesn't allow anything else anyway (#16).

You are correct that we may wish to simply drop the '*' prefix (I suggested that as an option in the original submission comment). The value it does have is that it allows us to have Trace-Contexts that encode a Trace-id (or differently (e.g. Base64), some other string etc).

It is a worthy point of discussion.

codefromthecrypt · 2017-09-23T00:52:40Z

I think this issue basically changes the charter of this spec to carry unlike and incompatible formats, which is why I feel a bit concerned about the several requests to quickly merge it. I get a sense that from the submitters' POV (microsoft in this case), you feel the trace-context header is firstly not for interop, rather it has a more primary goal which is tunneling. I am not sure everyone or even a majority agree with this. If it were something to agree on, probably a top-level issue on the charter is a better way to frame vs a sidecar on format encoding. I asked Mark N who helped lead http/2 and other specs and he isnt aware of any prior art for "tunneling headers" ex passing incompatible data through the same name. Probably worth at least talking with people familiar with http rfcs as a part of expanding charter along these lines as it is a bit odd so worth it. If we were to have and define multiple formats, it would seem rule of three would at least apply here. While there are multiple tracing systems who can use the format previously maintained, it is unclear if any non microsoft system is or would use the hierarchical format. If not, how would we make sense of defining a top level format as such? What would happen if say toyota asked for a different format (trust me there are at least 10!).. would we simply accept and add to registry each deviation? How would that work? Personally, I do see value in what the hierarchical id is trying to accomplish, it isnt a technical detail more that we are discussing what some besides me believe to be an interop spec. If that is the case, we really ought to do prioritize dilligence over decisions as say we just merged fast as requested.. would that be a great idea? We would have taken a spec that can be implemented in multiple systems and broken that without a fix.

tylerbenson · 2017-09-23T05:00:19Z

Taking that last comment a step further I think this PR should be limited to just the format discussion. The hierarchical ID part should be moved to a separate PR where it can be better explained and discussed.

codefromthecrypt · 2017-09-23T09:49:46Z

had a chat w/ @basvanbeek I think @tylerbenson's comment about understanding formats is the biggest thing we should start with. For example, bas mentioned maybe we have too much focus on convergence without understanding the landscape fully. It may be easy for existing dapper-like systems to use the format already in place, but very hard for others, and we should know more about the impact.

Let's make a trace-contexts.md file or similar which describes known formats (regardless of whether they are interested in converging on trace-context or not)? This should include truncation impact. With some of this in mind, we could know for example, if some IDs are contributory (Ex append data via path additions etc), incremental (one link incrementing a number from a prior node), or opaque (ex traditional dapper IDs)?

Maybe through this, we can help run through the problems and what's worth solving by propagated correlation IDs (ex imprecise within a trace) vs trace IDs (ex putting something precisely at a place in a trace).

I'll offer to start that file (ps suggestion is file vs #4 as the latter is presumptuous about adoption, when the first goal is understanding)

SergeyKanzhelev · 2017-09-25T17:28:40Z

@adriancole this is correct. We have a problem that is hard to solve with the trace-span combination and we can discuss it. For the sake of this comment - let's assume we agree that we need a way to restore span relationships in high-loss environment and we agree that the hierarchy is a way to go.

The way we are going to use the proposed format is to use all defined pieces of it. We are going to use <trace-id> to indicate the common identifier across the trace, we are going to use <span-id> as a base for the hierarchy and than every layer is free to just append `.X' to the end or reset the base of hierarchy. So this proposal is not an attempt to tunnel two different formats - it's a way to extend the format for our scenario.

The reason we propose to make <span-id> optional is that for mobile devices initiated the trace and just generated a 128bit number we want to save space and not generate another 64bit. So the first few spans which may still be on the device the correlation "string" will be shorter.

That's said in the discussion with @bogdandrutu we decided that since all pieces except <trace-id> are optional we may want to support extensibility this way.

SergeyKanzhelev · 2017-09-25T17:29:20Z

Do you think we need to add the requirement as a section in the document in this PR?

vancem · 2017-09-25T19:28:59Z

@adriancole I have no problem having a document as part of this that addresses the interop scenario.

Indeed a first stab that we (Microsoft) made internally at trying to make a ID standard tried to do this.

https://github.com/Microsoft/ApplicationInsights-Home/blob/1a5deefda4e26afee796bd7675e77b2f40391597/Correlation/Experimental/HttpRequestCorrelationSpec.md

By the way the first section of this doc does try to present goals for the standard as a whole.

Basically the idea was to make the ID 'successively more refined'. At the top level it was just a string and had no structure. At the next level if you put a - in the string it was assumed that the first part was the TraceID and the second was the span. It then addressed what to do if you did not want variable-sized ID (you hash), but frankly the hash was basically really just a character->binary converter (which you needed anyway). Thus fixed-sized ID systems would 'just work' with their own ID (the hash would leave them unchanged), but would do something sensible (hash) on anything else (thus there would be no illegal IDs).

The scheme above I think has merit, because it keeps the mandates to a minimum (it allows lots of ID formats, but they all can still interoperate to some degree). However we knew that since most of the logging world seemed to be 2-level, we decided that we just wanted the hierarchical format to 'not be excluded'. What this meant was that since both schemes had the notion of a TraceID we wanted to be able to participate in that at least. This is what the proposal above does.

Note that any 'technical debt' (not saying exactly what the propagation logic should be), ONLY has a negative for he hierarchical case, but that is STRICTLY better than the original proposal (which prevents systems with hierarchical IDs from even sharing their TraceID).

Really all I wanted in the short term was to allow that the format as FLEXIBLE (it had chunks below of which parts could be used (and new parts added in the future).

What I really want, however is to see PROGRESS (lets agree on what we can agree on). It seems like one thing we can agree on is that the SYNTAX for the ID should be such that we can VERSION things and that we can SHARE parts that are the same (e.g. TraceID). That part is hopefully relatively uncontroversial, and we should at least nail that down.

To that end, I am OK MOVING the hierarchical format to another document/section. Just let me know if you want it ripped out and I will rip it out. PROGRESS is the key thing here...

nicmunroe · 2017-09-30T00:06:55Z

I have a few concerns with this (and sorry for the wall of text):

I think this change makes it noticeably more difficult for implementers. I think it has a higher cognitive load to understand all the possibilities and options, figure out how they relate to your tracing system, figure out how you can/want to handle formats that come from different systems, and actually implement those concretely for your tracing system.

I think it's a bit hand-wavy to say implementers can just ignore the prefixes that don't apply to them. Many many devs don't want to or have the time to understand the full implications of their implementation of a spec, especially in regards to interop. They usually stop at the point of "yeah this seems to work for my use case" and call it good. In practice different people will create their parsers in different ways given the ambiguities and intentional openness of the prefix-based format, and the reality of their decisions will cause a wide range of strange bugs and broken tracing in the wild.

We can mitigate this a bit with more documentation around recommended ways to parse and fulfill interop in the prefix-based format, but that increases the cognitive load even more, leading to more potential for bugs and less desire to participate in and implement the spec. From a personal anecdotal perspective, I find the existing spec to be short, easy to understand, and easy to implement. Low cognitive load and high testability. I had to spend significant time reading through the diffs and comments here for this pull request to really get what was happening and how I might fulfill the new spec in my tracing system.

Again, it's easy to say "we can't fix other people's apathy or incompetence - the spec has the information in it necessary to do the job - if you can't hack it that's your problem", but if the ultimate goal of this spec is successful interop between as many tracing systems as possible (even if it's just partial or best-effort interop) then I think this new format is likely to work against that goal (sorry for the bold - I don't want this point to be lost in the wall of text).

I'm also not a fan of the idea that if I don't understand an incoming format then I should just pass it downstream unmodified. As a tracing tool implementer my tool has a job to do - if I see a header I don't understand, I'm much more likely to assume a bug in the caller (which unfortunately happens much too often) and clobber it so that the downstream systems will have a tracing context that I know works. As an HTTP service creator and maintainer that's the behavior I'd want to see too. I'll likely be able to extract at least the trace ID and keep that going, but will that always hold true in the future for such an open ended format? What about other implementers that may not be as diligent and careful with their parsers? Again, the higher cognitive load and openness of the format means higher chance of tool A not understanding tool B's format, raising the chance of their parser completely blowing up due to bad exception handling or naive parsing implementation.

I don't want to completely lock out formats that don't fit into the precise current definition of 16 byte trace ID and 8 byte span ID though. I agree with some others on here that we should start with an attempt to understand existing formats. I imagine we'll start to see patterns that are conceptually similar and can be grouped together. For example there's the current spec's version 00:

• <Version>
• <Trace ID> (32 char lowercase hex string)
• <Span ID> (16 char lowercase hex string)
• <TraceOptions> (optional)
• Previous sections separated by a dash

The hierarchical ID format described in this pull request seems to share a lot conceptually, except the "span" in that case is a more free-form hierarchical ID. So why not a version 01 specified like this?

• <Version>
• <Trace ID> (32 char lowercase hex string)
• <Span ID> (unbounded string that cannot contain a dash but is otherwise unconstrained)
• <TraceOptions> (optional)
• Previous sections separated by a dash

And as a hint to help with interop for hierarchical ID systems, maybe one of the trace options for version 01 could indicate that implementers should append a new span ID onto the old one rather than throwing the old one away and creating an entirely new one? That seems like it could potentially cover a lot of tracing systems that don't fall into the strict 16 byte trace ID and 8 byte span ID bucket.

Maybe after doing an audit of the tracing landscape we find that there are only a small handful of these kinds of patterns, and we could have a few well-defined and well-constrained (and similar-feeling) versions that reduce the cognitive load and possibility for interop bugs, but don't cut anybody out. Maybe that's overly optimistic but I think there are significant real-world benefits to the versioned explicitly-defined formats vs. an open-ended prefix format that help with adoption and interop, and I'd like to see the analysis done before abandoning the versioned formats. And as mentioned the analysis could have other benefits even if the prefix format is ultimately adopted so I don't think it's a wasted effort regardless of the outcome.

If anyone's bothered to read all this I salute you! 🖖 :)

vancem · 2017-10-02T19:22:02Z

I have updated the pull request to pull the Hierarchical (now call structural) IDs into their own document.

This is meant to emphasize that for those who like the simplicity of the specification before this PR, you can have this.

@nicmunroe - I like to think that the cognitive load is really not significantly higher in this proposal than the original. You have two numbers and some options.

But there is real value in doing things this way rather than the original. Before this PR, if you need to change the layout of the ID FOR ANY REASON, then you need to update the version number. However all existing systems will NOT RECOGNISE this new number, and thus are very unlikely to do something useful. In all likelihood this will prevent version 2 from every being created (since you would need to insure that all parts of the system understand Version 2 before any part of the system can start using it). More likely new systems will simply use another HTTP header (e.g. Trace-Context-2).

The goal of the tagged values is to allow INCREMENTAL improvements (or divergence), in the IDs. Thus you can add NEW tagged values and the old parsers will still at least parse what they know. This is an important property of the system, and can be achieved with the VERY SIMPLE proposal here.

I understand that there are MANY issues associated with interoperability if we allow variations of what the structure of the ID is. However I do not wish to block getting WHAT WE EASILY CAN AGREE ON, on resolving those questions (we can do so, but lets do it as a separate thread/PR).

I would like to believe the tagging idea expressed here is pretty uncontroversial, as it costs basically nothing (it is smaller than the original, and just as easy to parse) and has obvious benefits (you can make additions to the ID structure WHILE ALLOWING existing parsers to parse what IS unchanged about version 2 (it also allow you to encode both the new and old formats SIMULTANEOUSLY, which is very useful turn the transition from Version 1 to version 2). Finally it ALLOWS the POTENTIAL for MORE SYSTEMS to participate in INTEROPERABILITY.

I would like ask if there are still any vetos of this PR given the change I made?

Thus the

codefromthecrypt · 2017-10-03T04:38:52Z

FYI each spec change/addition requires careful thought and review, and just to let you know there are folks who haven't weighed in on anything at all, yet. Basically, let's bear in mind that Nov 1 workshop was set aside for folks who can't dedicate daily or weekly time to review PR comments. A major change like this, however sensible, needs as much focus on support as it does lack of vetos, IMHO.

nicmunroe · 2017-10-03T17:29:42Z

I still think the parsing in the prefix model is more complex and requires more careful reading of the spec, and therefore more prone to misinterpretation and bugs than the current versioned model. I also acknowledge that it's a bit subjective and others will have differing (and still valid) opinions depending on language, experience, or preferences.

I see your point about better likelihood of supporting some parts of the tracing context even as more prefixes are added (e.g. a properly coded parser can always parse trace ID from the * prefix no matter what new prefixes are added down the road), but I disagree that the prefix model is the only way to accomplish that goal.

To my mind, the prefix model can be thought of as versioning each component of the tracing context separately with the ability to add more separately-versioned components as time goes on. i.e. * is the "version 00" for trace IDs, - is the "version 00" for span ID, . is the "version 00" for hierarchical/appended span ID, and ~ is the "version 00" for options.

Maybe the prefixes are the best way to do such a separately-versioned-components tracing context, maybe not. I think there are some significant benefits to versioning the components separately as you've described - especially for those components like trace ID that are largely compatible across tracing systems and won't need many versions (maybe just one), but I think it necessarily comes with some more complexity both conceptually and practically, and the tradeoff needs to be considered.

After switching my perspective on this PR to thinking about it as a way to separately version the components I think I'm more ok with it, but now new questions have popped up. On one hand, this means if I receive a tracing context with a . version of the "span ID component" and my tracing system only understands the - version, I can add a new - span ID to the context while leaving the . one unmodified. This could potentially improve interop. On the other hand if my tracing system understands both, and I receive both, what should I do about it? Adjust the one I prefer and leave the other unchanged? Adjust both since I understand both (extra code to maintain)? What about option-component-versions, or trace-id-component-versions? The interactions between all these prefixes/component-versions and what's recommended for one set of them vs. the other could get really lengthy and complex. This is partly what I meant by higher cognitive load and more possibility for bugs, different interpretations, and confusion.

This is all just my opinion as a relatively new tracing system maintainer. I don't feel like I have the breadth of experience in this space to hold a veto, and even if I had one I wouldn't necessarily use it here since I can see some benefits to both models. But I believe I am part of the intended audience for this spec; I will be implementing it at some point and I don't think I'm the only one who will have these concerns.

wu-sheng · 2017-10-09T02:23:42Z

I think at this point, prefix encoding or version make s the spec too complex for implementation. Version way is the easiest way, maybe not the best way. We should focus on our first release, and give the implementation more time to support it.

vancem · 2017-10-09T15:59:58Z

Note in reply to @adriancole @nicmunroe and @wu-sheng.

For those that think the prefix notion is too complex, please just take a look at it Here is the resulting spec.

https://github.com/vancem/tracecontext-spec/blob/793c4012f0f78cf1c8371287d6a1db7d8e598b08/HTTP_HEADER_FORMAT.md

It is literally 2 pages in length, and the meat of it is only 1 page (the rest is examples and clarification). It is just not that complex. It pretty much just says that the ID has parts, and this is how you parse each of the parts. Those parts are EXACTLY the three parts that existed in the original spec. Even the actual characters that are being sent (see examples) have barely changed.

The goal of this PR is to try to make it possible to version INCREMENTALLY. That is if you wanted to add something to the ID you could do so WITHOUT having to modify all the existing clients (with a classic version number, when existing code sees version 2, it knows NOTHING about how to parse it and has to give up (unnecessarily). That is at REAL problem). By explicitly decoupling the PARSING of the parts of the ID from each other, it allows for more stuff to be added while the existing implementations can parse what they always have.

Just work through how you would ever move the version number from 1 to 2 using the old approach, and then ask yourself how you would do it if you can simply add the new information at the end of the ID with a new prefix FOR THAT PART (thus retaining compatibility with all existing parsers). Is that not worth something?

Sure you might be able to achieve this versionability other ways, but this way is here, now, and is VERY SIMPLE. It is hard to believe you can make something significantly simpler. The risk is thus very low, and you CAN say that it is BETTER than the original 'version-number-for-the-whole-id' approach. Given that is it not better to at least move on to that (and someone can submit a PR for something even better, if they can find it).

Sure when you add a new feature (e.g. support for hierarchical IDs), there are all sorts of questions that arise. We are NOT trying to answer those as we are only worrying about the format that we need and the versioning scheme we need so we CAN MAKE THESE CHANGES LATER.

If we are to have any hope of a coherent standard, we need to find those things that we can agree upon. One such principle is that we should not NEEDLESSLY limit flexibility (since we would like many existing systems to interoperate with this standard to the degree they can). This format helps with that as well.

If we are to succeed in agreeing to a standard, we should all be flexible, and accept small things that really don't interfere with your important scenarios if they ARE important to other members of the standards group. In that light, is there REALLY pushback to proposal? Exactly what is the problem?

yurishkuro · 2017-10-09T16:23:04Z

@vancem how is the versioning achieved in the prefix format, by using different prefix characters? If so, it means the protocol will keep increasing the number of reserved characters, which actually breaks backwards compatibility.

vancem · 2017-10-09T16:33:20Z

@vancem how is the versioning achieved in the prefix format, by using different prefix characters? If so, it means the protocol will keep increasing the number of reserved characters, which actually breaks backwards compatibility.

Yes, when you add new things you create new prefix characters (but you also put the new stuff at the end). Yes, you 'use up' characters as you go, but that is really not a big problem. The number of expected new version things is probably < 10. We have a couple dozen punctuation characters that we can use. Moreover, we can ALWAYS make some of the new prefixes take two characters instead of one giving us 100s of more prefixes. We really can't run out.

which actually breaks backwards compatibility.

I don't understand this comment. How do new prefix characters at the end break existing parsers? When these parsers see the unexpected prefix, they give up (although we encourage them to save the whole thing so 'smart' back ends may be able to do something). Still they have already parsed what they can, and will work as well as they did if the new information was not there. That seems like the best you can do...

The main point is that this is STRICTLY better than the original (where a new version number again forces the parser to give up, but there, without parsing ANYTHING (and in particular the TraceID).

yurishkuro · 2017-10-09T16:37:24Z

which actually breaks backwards compatibility.
I don't understand this comment.

e.g. if in the future versions you declare a certain character as special when previously it was allowed in the values.

vancem · 2017-10-09T16:48:41Z

e.g. if in the future versions you declare a certain character as special when previously it was allowed in the values.

Indeed, each prefix must define the mechanism by which it decides what it consumes in its parse. Typically it is either a fixed number of characters or that that the character is outside a set (e.g. Hex digits). It might define a special terminating character. The current spec makes these choices in a sensible way leaving plenty of room for new prefixes.

SergeyKanzhelev · 2018-01-23T16:47:40Z

@vancem I'm closing this PR as standard defined encoding of vendor-specific properties as name/value collection instead of single header with fields prefixes. Feel free to re-open this discussion if the approach with Trace-Context-Ext doesn't seem OK.

Suggest Prefix encoding instead of a version number

5d760c7

jacpull reviewed Sep 21, 2017

View reviewed changes

yurishkuro reviewed Sep 22, 2017

View reviewed changes

Move the hierarchical (structured) IDs to their own document.

793c401

bogdandrutu mentioned this pull request Oct 4, 2017

Make TraceOptions optional. #12

Merged

SergeyKanzhelev closed this Jan 23, 2018

Suggest Prefix encoding instead of a version number #15

Suggest Prefix encoding instead of a version number #15

Uh oh!

Conversation

vancem commented Sep 20, 2017

possible variations

quick link

Uh oh!

bogdandrutu commented Sep 20, 2017

Uh oh!

codefromthecrypt commented Sep 21, 2017 via email

Uh oh!

codefromthecrypt commented Sep 21, 2017 via email

Uh oh!

jacpull Sep 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacpull commented Sep 21, 2017

Uh oh!

vancem commented Sep 21, 2017

Uh oh!

vancem commented Sep 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codefromthecrypt commented Sep 22, 2017 via email

Uh oh!

SergeyKanzhelev commented Sep 22, 2017

Uh oh!

vancem commented Sep 22, 2017

Uh oh!

vancem commented Sep 22, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vancem Sep 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yurishkuro Sep 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codefromthecrypt commented Sep 23, 2017 via email

Uh oh!

tylerbenson commented Sep 23, 2017

Uh oh!

codefromthecrypt commented Sep 23, 2017

Uh oh!

SergeyKanzhelev commented Sep 25, 2017

Uh oh!

SergeyKanzhelev commented Sep 25, 2017

Uh oh!

vancem commented Sep 25, 2017

Uh oh!

nicmunroe commented Sep 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vancem commented Oct 2, 2017

Uh oh!

codefromthecrypt commented Oct 3, 2017 via email

Uh oh!

jacpull Sep 21, 2017 •

edited

Loading

vancem commented Sep 21, 2017 •

edited

Loading

vancem Sep 22, 2017 •

edited

Loading

yurishkuro Sep 22, 2017 •

edited

Loading

nicmunroe commented Sep 30, 2017 •

edited

Loading