Skip to content

subscriber: add nested spans in json formatter #741

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 9, 2020
Merged

subscriber: add nested spans in json formatter #741

merged 10 commits into from
Jul 9, 2020

Conversation

mockersf
Copy link
Contributor

Fixes: #704

Motivation

Nested spans are not available in json formatter

Solution

I added a field spans in the json, with an array of the nested spans. I reused the span representation from the existing span, and pushed all the span representation in a vec using ctx.visit_spans. I didn't remove the span field as it can be useful to have direct access to the current span

@mockersf mockersf requested review from hawkw and a team as code owners May 30, 2020 02:56
@hawkw
Copy link
Member

hawkw commented Jun 1, 2020

The CI failure is unrelated (it's due to a flaky test in tracing-appender, which will be fixed by #737). I've gone ahead and re-run the failed jobs.

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me overall! I had some suggestions on potential improvements we might want to make, but they are not blockers for merging this. Let me know if you're interested in making any of those changes — if not, I'll merge this as-is.

Comment on lines 129 to 136
let mut spans = vec![];
ctx.visit_spans(|span| {
spans.push(visit_span(span));
Ok(())
})?;
if !spans.is_empty() {
serializer.serialize_entry("spans", &spans)?;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than collecting to a Vec, which allocates, it should be possible to serialize the spans without an intermediate allocation, using serde's SerializeSeq. This might offer a small performance improvement.

I think we could do this by adding a newtype which wraps the Context::spans iterator and implements serde::Serialize like this:

    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
         let mut seq = s.serialize_seq();
         for span in self.0 {
             // ...
         }
    }

and passing that struct to serialize_entry as the value.

Of course, this would involve a little more complexity than the current approach, so I'm fine with merging what we have now and doing this optimization as a follow-up. We might want to do some benchmarking to determine how much of a performance improvement we would get from it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented Serialize for newtypes for the context and the span, to keep it all managed by serde.
Please let me know if you see any issue in how I did it

@hawkw
Copy link
Member

hawkw commented Jun 1, 2020

Oh, a couple of API notes:

  • Do you think we might want to add the option to turn off the span field that contains the current span? We probably want to continue generating it by default to avoid breaking code that parses the current output format, but because we are now also generating the spans list, some users might not want to duplicate the current span.
  • Do we also want to provide the option to disable the spans list for users who only want the immediate current span? Disabling the spans list would be a bit more efficient in cases where it isn't wanted.

What do you think?

@mockersf
Copy link
Contributor Author

mockersf commented Jun 3, 2020

Thank you for your review !
adding options to turn off current span or nested spans logging sounds good, I'll do it soon (hopefully sometime in the next 2 days)

@mockersf
Copy link
Contributor Author

Added two options to the json formatter:

  • disable_current_span to not log the current span
  • disable_span_list to not log the span list

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some suggestions for making the API method naming more consistent with the rest of the crate, and for improving the documentation. Other than that, the implementation looks right.

It might also be good to add examples to the documentation to make it clearer what the new configuration options do. What do you think?

Comment on lines 309 to 379
pub fn disable_current_span(
self,
disable_current_span: bool,
) -> Layer<S, format::JsonFields, format::Format<format::Json, T>, W> {
Layer {
fmt_event: self.fmt_event.disable_current_span(disable_current_span),
fmt_fields: format::JsonFields::new(),
make_writer: self.make_writer,
_inner: self._inner,
}
}

/// Sets the JSON layer being built to not contain the span list.
///
/// See [`format::Json`](../fmt/format/struct.Json.html)
pub fn disable_span_list(
self,
disable_span_list: bool,
) -> Layer<S, format::JsonFields, format::Format<format::Json, T>, W> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the other builder methods that enable or disable some option have names starting with with_. Can we make these consistent with the others? I'd expect with_current_span(false) to enable the current span and with_span_list(false) to disable the span list.

}
}

/// Sets the JSON layer being built to not contain the span list.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

Suggested change
/// Sets the JSON layer being built to not contain the span list.
/// Sets whether or not the JSON layer being built will include a list
/// of all the spans in the current trace.

@@ -302,6 +302,36 @@ impl<S, T, W> Layer<S, format::JsonFields, format::Format<format::Json, T>, W> {
_inner: self._inner,
}
}

/// Sets the JSON layer being built to not contain the current span.
///
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also be worth explaining here that this controls the span field in the JSON object, not the spans list, which is controlled by the other method. The difference between these two methods may not be super obvious to everyone, so we should try to make that clearer.

Comment on lines 57 to 78
/// If set to `true` event won't contain current span.
pub fn disable_current_span(&mut self, disable_current_span: bool) {
self.disable_current_span = disable_current_span;
}

/// If set to `true` event won't contain spans.
pub fn disable_span_list(&mut self, disable_span_list: bool) {
self.disable_span_list = disable_span_list;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, can we give these names in the form with_$THING for consistency, and make false disable the option?

Comment on lines 191 to 192
let disable_current_span = self.format.disable_current_span;
let disable_span_list = self.format.disable_span_list;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these let bindings needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 311 to 321
pub fn disable_current_span(mut self, disable_current_span: bool) -> Format<Json, T> {
self.format.disable_current_span(disable_current_span);
self
}

/// Use the full JSON format without the span list.
///
/// See [`format::Json`](../fmt/format/struct.Json.html)
#[cfg(feature = "json")]
#[cfg_attr(docsrs, doc(cfg(feature = "json")))]
pub fn disable_span_list(mut self, disable_span_list: bool) -> Format<Json, T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same naming suggestions as above.

@hawkw
Copy link
Member

hawkw commented Jun 19, 2020

@mockersf Is there anything I can do to help get this branch ready to merge? I think it's very close and just needs the handful of API tweaks I suggested in #741 (review).

@mockersf
Copy link
Contributor Author

Sorry for taking so long, I changed the naming according to your review.

I also fixed the fact that when there is no span, "spans":[] was added to the log but not span (now they are both not present)

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we can merge this, I think we need to fix the clippy warning I commented on, so that CI passes.

I also fixed the fact that when there is no span, "spans":[] was added to the log but not span (now they are both not present)

I'm not totally convinced about this --- I think it might be better to always include those fields but emit an empty list or null when there is no current span, so that code consuming the JSON can expect those fields to always be present when they are enabled. This isn't a strongly held opinion, though, so I'm also fine with merging it as is --- @LucioFranco, what do you think about this?

I also had a few minor docs suggestions, if you don't mind making those changes as well.

Comment on lines 207 to 225
if self.format.display_span_list {
if current_span.is_some() {
serializer.serialize_entry(
"spans",
&SerializableContext(&ctx.ctx, format_field_marker),
)?;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like clippy does not like the nested if here and would prefer

Suggested change
if self.format.display_span_list {
if current_span.is_some() {
serializer.serialize_entry(
"spans",
&SerializableContext(&ctx.ctx, format_field_marker),
)?;
}
}
if self.format.display_span_list && current_span.is_some() {
serializer.serialize_entry(
"spans",
&SerializableContext(&ctx.ctx, format_field_marker),
)?;
}

}

impl Json {
/// If set to `true` event metadata will be flattened into the root object.
pub fn flatten_event(&mut self, flatten_event: bool) {
self.flatten_event = flatten_event;
}

/// If set to `false` event won't contain current span.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// If set to `false` event won't contain current span.
/// If set to `false`, formatted events won't contain a field for the current span.

self.display_current_span = display_current_span;
}

/// If set to `false` event won't contain spans.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// If set to `false` event won't contain spans.
/// If set to `false`, formatted events won't contain a list of all currently entered spans.

}
}

/// Sets whether or not the JSON layer being built will include a list.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Sets whether or not the JSON layer being built will include a list.
/// Sets whether or not the JSON layer being built will include a list of all
/// currently entered spans in formatted events.

@@ -555,6 +555,32 @@ impl<T, F, W> SubscriberBuilder<format::JsonFields, format::Format<format::Json,
inner: self.inner.flatten_event(flatten_event),
}
}

/// Sets whether or not the JSON layer being built will include the current span.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Sets whether or not the JSON layer being built will include the current span.
/// Sets whether or not the JSON layer being built will include the current span
/// in formatted events.

@@ -302,6 +302,26 @@ impl<T> Format<Json, T> {
self.format.flatten_event(flatten_event);
self
}

/// Use the full JSON format and sets whether or not it will include the current span.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Use the full JSON format and sets whether or not it will include the current span.
/// Sets whether or not the formatter will include the current span in
/// formatted events.

self
}

/// Use the full JSON format sets whether or not it will include the span list.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Use the full JSON format sets whether or not it will include the span list.
/// Sets whether or not the formatter will include a list of all currently
/// entered spans in formatted events.

@mockersf
Copy link
Contributor Author

Lint and doc should be good now 👍

I also fixed the fact that when there is no span, "spans":[] was added to the log but not span (now they are both not present)

I'm not totally convinced about this --- I think it might be better to always include those fields but emit an empty list or null when there is no current span, so that code consuming the JSON can expect those fields to always be present when they are enabled. This isn't a strongly held opinion, though, so I'm also fine with merging it as is --- @LucioFranco, what do you think about this?

I did this to match the current behaviour, but it can be done either way. I have the change locally if you want, I won't take days this time 😄

@hawkw
Copy link
Member

hawkw commented Jun 25, 2020

@mockersf it looks like rustfmt needs to be run again. Once we take care of that and decide what the most correct way to handle no current span is, I think this is good to merge.

@LucioFranco
Copy link
Member

So chatting with @hawkw and @davidbarsky on discord led to a few ideas:

one is to go with the span as a list as you suggested @mockersf

"span": []

To me this seems reasonable but I think lists in json are a bit funky and all over the place for such a structured item like spans. What I was suggesting was using objects instead:

"span": {
  "name": "MySpan",
  "parent": {}
}

This might be a bit more intuitive but is more verbose. So I wanted to hear from actual users what they prefer 😄

@hawkw
Copy link
Member

hawkw commented Jul 1, 2020

It would be nice to make a decision as to how we want to output the span hierarchy (as nested objects or as a list) so that we can move forward with merging this. Personally, I don't really care --- I would just like to ship support for nested spans in the JSON formatter in some form.

I'm tagging @sazzer — since you opened #704, and are actually using the JSON formatter in production, I'm guessing you might have an opinion about this. Also @danburkert. If anyone else is using the JSON formatter and wants to weigh in, it would be great to get your thoughts as well!

@sazzer
Copy link

sazzer commented Jul 1, 2020

@hawkw I'm not in Production yet, but I am using it.

Personally speaking I'm easy on how it works as long as it's consistent. I'd expect that a list would make more sense than an object. If it was an object I'm not sure exactly how it would be structured? The comment above from @LucioFranco seems to imply that each span would potentialy have a parent element, but that then means that if you get quite deep nesting you've got multiple levels to traverse. Essentially it would be:

{
  "name": "span_1",
  "parent": {
    "name": "span_2",
    "parent": {
      "name": "span_3",
      "parent": {
        "name": "span_4"
      }
    }
  }
}

whereas the same thing as a list would be:

[
  { "name": "span_1" },
  { "name": "span_2" },
  { "name": "span_3" },
  { "name": "span_4" }
]

Cheers

@hawkw
Copy link
Member

hawkw commented Jul 1, 2020

@sazzer yup, the examples you posted are more or less what it would look like. My two cents is that the nested object graph theoretically conveys the semantics more correctly, but that the list is probably easier to traverse and definitely easier to read. The list is also theoretically smaller on the wire, but if you care about sending the JSON output over the network, you're probably compressing it, and the nested hierarchy is compression-friendly enough that there's probably not a meaningful difference.

I'm willing to merge this either way.

@hawkw
Copy link
Member

hawkw commented Jul 1, 2020

I guess another potential advantage of the nested form is that it offers less configuration surface area: rather than switching between a "span" field with a single object and a "spans" field that's a list, we always emit a "span" field, and the only switch to flip is whether or not that field contains the span's parent(s)...

@sazzer
Copy link

sazzer commented Jul 1, 2020

One thing to consider is how they work with JSON friendly log tools.
We use ELK at work, where Logstash consumes JSON log files into Elastic search (from Java there - sorry!) And that's fantastic. There are plenty of other log management tools that natively consume JSON too, and I'm not sure how they would handle lists vs objects in this case.

I mention this because the name of the span seems something you'd want to filter on, and the other properties are important to easily pick out when diagnosing issues.

Cheers

@neoeinstein
Copy link

Based on my understanding of how to go about querying in elastisearch and CloudWatch Insights, having a list is going to be a lot easier to deal with when looking for a particular span. I can just filter on spans.name rather than needing to look in span.name and span.parent.name and span.parent.parent.name or know its precise location in the hierarchy (which might be dependent on the configured logging level). This becomes more of a problem with arbitrary depth recursive structures.

Because the spans form a path, the parent-child relationship can be unambiguously determined from an array, and the searchability is less hindered. For that reason, I'd prefer to have spans be an array. We should just determine whether the root comes first or last.

@hawkw
Copy link
Member

hawkw commented Jul 1, 2020

Okay, it's sounding like the list is probably better for most of the tools that will be ingesting these logs, and we should move forward with this PR as-is --- unless anyone else has a strong reason why the list is not correct.

Because the spans form a path, the parent-child relationship can be unambiguously determined from an array, and the searchability is less hindered. For that reason, I'd prefer to have spans be an array. We should just determine whether the root comes first or last.

Currently, it looks like the array is ordered so that the root span is the first element, and the current leaf span is the last. I'm fine with that, although I think ordering it in the other direction (leaf -> root) feels slightly more intuitive to me personally --- I don't really care which we choose, as long as it's consistent. However, it looks like this isn't currently documented. @mockersf, can we add the ordering to the documentation for the span list, please?

@mockersf
Copy link
Contributor Author

mockersf commented Jul 2, 2020

I added the ordering and updated the example with how it will look with default options

@hawkw
Copy link
Member

hawkw commented Jul 2, 2020

It looks like this needs to be updated to compile with the changes from 50e8574. Once that's taken care of, I'm happy to merge it, unless anyone else has any last issues.

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me!

@mockersf thanks for all your hard work on this, and for sticking with it through the bikeshedding about the correct output format.

Unless @LucioFranco or @davidbarsky want to take a last look, I think this is ready to merge!

@hawkw
Copy link
Member

hawkw commented Jul 7, 2020

Oh, it looks like rustfmt needs to be run again after the rebase --- mind doing that real quick? Thanks!

@LucioFranco
Copy link
Member

:shipit:

@davidbarsky
Copy link
Member

happy with this.

@hawkw
Copy link
Member

hawkw commented Jul 8, 2020

@mockersf sorry to bother you, but I think running rustfmt again is the only remaining blocker to merging this!

@mockersf
Copy link
Contributor Author

mockersf commented Jul 9, 2020

sorry about the format, I have another project where I have to disable format on save and I keep forgetting to reenable it back after...

fixed now, thanks !

@hawkw
Copy link
Member

hawkw commented Jul 9, 2020

Great, thanks! I'm going to merge this branch now, thanks for all your hard work on this!

@hawkw hawkw merged commit edbf22b into tokio-rs:master Jul 9, 2020
hawkw added a commit that referenced this pull request Jul 17, 2020
Changed

- **fmt**: When the `chrono` dependency is enabled, the `SystemTime`
  timestamp now emits human-readable timestamps rather than using
  `SystemTime`'s fmt::Debug`implementation (`chrono` is still required
  for customized timestamp formatting) (#807)
- **ansi**: Updated `ansi_term` dependency to 0.12 (#816)

Added

- **json**: `with_span_list` method to configure the JSON formatter to
  include a list of all spans in the current trace in formatting events
  (similarly to the text formatter) (#741)
- **json**: `with_current_span` method to configure the JSON formatter
  to include a field for the _current_ span (the leaf of the trace) in
  formatted events (#741)
- **fmt**: `with_thread_names` and `with_thread_ids` methods to
  configure `fmt::Subscriber`s and `fmt::Layer`s to include the thread
  name and/or thread ID of the current thread when formatting events
  (#818)

Thanks to new contributors @mockersf, @keetonian, and @Pothulapati for
contributing to this release!
hawkw added a commit that referenced this pull request Jul 17, 2020
### Changed

- **fmt**: When the `chrono` dependency is enabled, the `SystemTime`
  timestamp now emits human-readable timestamps rather than using
  `SystemTime`'s fmt::Debug`implementation (`chrono` is still required
  for customized timestamp formatting) (#807)
- **ansi**: Updated `ansi_term` dependency to 0.12 (#816)

### Added

- **json**: `with_span_list` method to configure the JSON formatter to
  include a list of all spans in the current trace in formatting events
  (similarly to the text formatter) (#741)
- **json**: `with_current_span` method to configure the JSON formatter
  to include a field for the _current_ span (the leaf of the trace) in
  formatted events (#741)
- **fmt**: `with_thread_names` and `with_thread_ids` methods to
  configure `fmt::Subscriber`s and `fmt::Layer`s to include the thread
  name and/or thread ID of the current thread when formatting events
  (#818)

Thanks to new contributors @mockersf, @keetonian, and @Pothulapati for
contributing to this release!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nested Spans don't work correctly in JSON logs
6 participants