Skip to content

Message Protocol

Robert Ferris edited this page Apr 18, 2014 · 4 revisions

Bytefrog collects tracing data by passing messages over TCP between the tracer agent and the data collector/processor (HQ). To keep messages simple and compact, messages are packed binary data (rather than textual). This helps keep bandwidth and memory usage low. Each message will have a one byte message type header, followed by message contents as defined for each message type. There is no delimiter between messages.

Any string values sent (such as method signatures) shall be sent in a consistent encoding. Encoding shall be Java modified UTF-8, as written by DataOutputStream. Byte sizes sent shall be actual sizes (determined after encoding the string), and not just the string length.

Multi-byte values shall be represented using big-endian (traditional network byte ordering). Keep this in mind, as x86 and x86-64 are little-endian. Java should provide functions for converting values if required, so we don't need to worry about endianness.

Per the message queueing system spec, a DataOutputStream will be provided for writing the message to. DataOutputStream handles all endianness, string encoding, and length prefixing concerns for us. This makes encoding messages per this protocol easy and with relatively little overhead.

Message Quick Reference

ID Message Size Response
0 hello 2 bytes configuration
1 configuration 5 bytes + n none
2 start 1 byte none
3 stop 1 byte none
4 pause 1 byte none
5 unpause 1 byte none
6 suspend 1 byte none
7 unsuspend 1 byte none
8 heartbeat 4 bytes none
9 data break 5 bytes none
10 map thread name 9 bytes + n none
11 map method signature 7 bytes + n none
12 map exception 7 bytes + n none
20 method entry 15 bytes none
21 method exit 17 bytes none
22 exception 21 bytes none
23 exception bubble 19 bytes none
30 data hello 2 bytes data hello reply
31 data hello reply 1 byte none
40 class transformed 3 bytes + n none
41 class ignored 3 bytes + n none
42 class transform failed 3 bytes + n none
50 marker 13 bytes + K + V none
99 error 3 bytes + n none

Control Messages

Control messages provide status and control communication between HQ and the agent. These are small, time-sensitive messages that shall be sent in a prioritized fashion (i.e., bypassing any sort of send queue containing data messages).

Control messages are sent over the first connection (the one you send hello on).

Hello Message

Description

The hello control message is sent from agent to HQ immediately upon connection. No other communications are valid until after this message is sent.

This message is only sent from the first connection. If multiple streams are used for sending data, use the data hello message for the other streams.

This message carries a base timestamp. Offset timestamps are used for all events, so this message specifies the base time.

This message carries a version number as well, to allow changes or extensions to be made to the protocol in the future. Currently, the version number is 1. This number is incremented whenever breaking changes are made, so HQ can make sure to handle messages properly.

Format
[1 byte: type ID (0)][1 byte: protocol version]

The message type ID for this message is 0. The message contains an 8-bit unsigned value specifying the protocol version. Message size is fized at 2 bytes.

Data (Concurrent) Hello Message

Description

The data hello control message is sent from agent to HQ on data connections. This message replaces the hello message on all but the first connection when multiple streams are used.

Additional connections and the sending of data hello should not occur until after configuration is received on the first connection. Any configuration relating to data streams should be taken from that configuration.

Format
[1 byte: type ID (30)][1 byte: run ID]

The message type ID for this message is 30. The message contains an 8-bit unsigned value specifying the run ID this connection should be associated with. Message size is fixed at 2 bytes.

Error Message

Description

The error message is sent by either end to report a fatal error. This message is provided as a best-effort error reporting service so there's a chance of logging a failure. Obviously, if the control connection is compromised, this message may or may not make it through.

The expectation is that the trace dies sometime afterward, since a fatal error has occurred.

Format
[1 byte: type ID (99)][2 bytes: length of error message][n bytes: error message]

The message type ID for this message is 99. The message consists of the provided error, prefixed by the length as a 16-bit unsigned value. Message size is dynamic, 3 bytes plus the size of the error message.

Configuration Message

Description

The configuration control message is sent from HQ to the agent. It contains the configuration we need to pass to the agent. This will occur after the hello message is received, but before any other control messages (i.e., start or suspend) are sent. Agent must not send any message (other than hello) before this message is received, and HQ must send configuration immediately after receiving the hello message if the connection is to continue.

This message will not be sent in reply to data hello messages.

Configuration will contain the run ID, queue configuration information, data socket information, how often to send a heartbeat, chunk size for sending messages (argument to drainTo()), and whatever other information is necessary for the agent to run.

Format
[1 byte: type ID (1)][4 bytes: length, in bytes, of configuration data][n bytes: encoded configuration data]

The message type ID for this message is 1. The message contents consist of a configuration object encoded by binary serialization, prefixed by the length of the configuration data, in bytes (32-bit unsigned). Message size is dynamic, 5 bytes plus the size of the configuration data.

Implementation note: configuration uses a 32-bit length prefix, to allow for configurations greater than 64KB in size. Encoding and length-prefixing will need to be done manually, as DataOutputStream uses 16-bit length prefixes.

Data (Concurrent) Hello Reply Message

Description

The data hello reply control message is sent from HQ to the agent, in reply to the data hello message. This message simply signals that HQ is acknowledging and ready to begin receiving data on the stream.

Format
[1 byte: type ID (31)]

The message type ID for this message is 31. There are no message contents. Message size is fixed at 1 byte.

Start Message

Description

The start control message is sent from HQ to the agent to signal that we are ready to begin tracing. Upon receipt of this message, the agent will stop waiting (presumably in premain) and allow the program to start running.

The start message is here to allow HQ the chance to send configuration data, suspend tracing, and perform whatever other initialization necessary before the program actually begins.

Two modes of operation were discussed at one point - where you're either collecting from the start, or waiting for further word before traces are collected. Implementation of this will be: if you don't want to collect from the start of the program, HQ shall send a suspend message before it sends a start message. Until/unless told otherwise by HQ (by means of suspend or pause messages), the default state of the agent upon receiving the start message shall be running as normal (not suspended, not paused).

Format
[1 byte: type ID (2)]

The message type ID for this message is 2. There are no message contents. Message size is fixed at 1 byte.

Stop Message

Description

The stop control message is sent from HQ to the agent to signal that we are done tracing. Upon receipt of this message, the agent should stop execution of the program.

Format
[1 byte: type ID (3)]

The message type ID for this message is 3. There are no message contents. Message size is fixed at 1 byte.

Pause Message

Description

The pause control message is sent from HQ to the agent to signal that program execution should be paused. Upon receipt of this message, agent shall pause tracee program execution as quickly as possible.

Data loss shall not occur as a result of a pause message.

Format
[1 byte: type ID (4)]

The message type ID for this message is 4. There are no message contents. Message size is fixed at 1 byte.

Unpause Message

Description

The unpause control message is sent from HQ to the agent to signal program execution shall resume.

Format
[1 byte: type ID (5)]

The message type ID for this message is 5. There are no message contents. Message size is fixed at 1 byte.

Suspend Message

Description

The suspend control message is sent from HQ to the agent to signal that HQ is not interested in receiving trace data at the moment. Upon receipt of this message, the agent shall stop collecting trace data as quickly as it can.

Data loss is expected as a result of a suspend message.

Format
[1 byte: type ID (6)]

The message type ID for this message is 6. There are no message contents. Message size is fixed at 1 byte.

Unsuspend Message

Description

The unsuspend control message is sent from HQ to the agent to signal that HQ is interested in now receiving trace data. Upon receipt of this message, the agent shall start collecting and sending trace data as quickly as it can.

Format
[1 byte: type ID (7)]

The message type ID for this message is 7. There are no message contents. Message size is fixed at 1 byte.

Heartbeat Message

Description

The heartbeat control message is sent from the agent to HQ to report that the agent is alive and kicking. The message will the current mode of operation as well as the current send buffer size. This is an informational message, HQ will specify how often this message is to be sent, and may take action or assume failure after some number of heartbeat messages are missed.

Format
[1 byte: type ID (8)][1 byte: mode of operation][2 bytes: send buffer size]

The message type ID for this message is 8. The message contents consist of the mode of operation (8-bit unsigned, see list below) followed by the current send buffer size (16-bit unsigned). Message size is fixed at 4 bytes.

Modes of Operation
  • Initializing = 'I' (73)
  • Tracing = 'T' (84)
  • Paused = 'P' (80)
  • Suspended = 'S' (83)
  • Shutting down = 'X' (88)

Class Transformed Message

Description

The class transformed message is sent from the agent to HQ to report that the agent has transformed a class in the target application. This is an informational message, allowing HQ's UI to display a list of transformed classes on the fly.

Format
[1 byte: type ID (40)][2 bytes: length of qualified class name][n bytes: qualified class name]

The message type ID for this message is 40. The message contents consist of the fully-qualified name of the transformed class (prefixed by the name's length, as a 16-bit unsigned value). Message size is dynamic, 3 bytes plus the size of the class name.

Class Ignored Message

Description

The class ignored message is sent from the agent to HQ to report that the agent has gone through the motions of transforming a class in the target application, but chose to leave that class intact and unchanged. This is an informational message, allowing HQ's UI to display a list of non-transformed classes on the fly.

Format
[1 byte: type ID (41)][2 bytes: length of qualified class name][n bytes: qualified class name]

The message type ID for this message is 41. The message contents consist of the fully-qualified name of the ignored class (prefixed by the name's length, as a 16-bit unsigned value). Message size is dynamic, 3 bytes plus the size of the class name.

Class Transform Failed Message

Description

The class transform failed message is sent from the agent to HQ to report that the agent has tried and failed to transform a class in the target application. This is an informational message, allowing HQ's UI to display a list of classes that could not be transformed.

Format
[1 byte: type ID (42)][2 bytes: length of qualified class name][n bytes: qualified class name]

The message type ID for this message is 42. The message contents consist of the fully-qualified name of the ignored class, prefixed by the name's length as a 16-bit unsigned value. Message size is dynamic, 3 bytes plus the size of the class name.

Data Break Message

Description

The data break message is sent from the agent to HQ to report that there is a break in the data flow before the reported sequence ID. This typically will mean that tracing was suspended, and that any inferred call stack state/etc should be discarded when data reaches that sequence ID.

Format
[1 byte: type ID (9)][4 bytes: sequence ID for first message following the data break]

The message type ID for this message is 9. The message contents consists solely of the next sequence ID after the data break. Message size is fixed at 5 bytes.

Data Messages

Data messages provide the actual trace data. These are always generated by the agent and sent to HQ. Data messages are lower priority than control messages, so if any control messages are queued, they must all be sent before any further data messages.

The trace data sent consists of method entries, method exits, and exceptions. For all events, information about the current thread (thread name, and a 16-bit unsigned unique thread ID) and source location (line number, if available). For method events, the method signature is sent, and for exception events, the exception type name is sent. Thread names are mapped to the unique thread ID, and method signatures are mapped to a unique 32-bit unsigned value.

Data messages are sent over data connections (ones you send data hello on).

Map Thread Name Message

Description

The map thread name data message tells HQ the name for a thread. Thread names may change during execution, so new map thread name messages will need to be generated if the thread name has changed.

To track changing thread names over time, this message is timestamped.

A simple implementation would be to keep track of the last sent thread name, as well as the unique thread ID, in thread local storage (our AspectJ tracer implementation already handles unique thread IDs stored thread local). On every data event, check if the cached thread name matches the actual, and when it doesn't, emit a map thread name message and update the thread local value.

Format
[1 byte: type ID (10)][2 bytes: thread ID][4 bytes: relative timestamp][2 bytes: length of encoded thread name][n bytes: encoded thread name]

The message type ID for this message is 10. Message content will be the 16-bit thread ID, followed by a timestamp offset (32-bit unsigned, milliseconds since start of tracing), followed by the thread name (prefixed by the name's length, in bytes, as a 16-bit unsigned value). Thread name will be truncated to (2^16 - 1) bytes (~64KB). Message size is dynamic, 9 bytes plus the size of the thread name.

Map Method Signature Message

Description

The map method signature data message tells HQ the ID given for a particular method signature. IDs must be unique, however, more than one ID may be mapped to the same signature. The reason for allowing multiple IDs per signature is in consideration of memory management, such that old mappings may be flushed from memory in the agent under memory pressure.

Format
[1 byte: type ID (11)][4 bytes: assigned signature ID][2 bytes: length of encoded signature][n bytes: encoded signature]

The message type ID for this message is 11. Message content will be the 32-bit ID assigned to the signature, followed by the signature itself (prefixed by the signature's length, in bytes, as a 16-bit unsigned value). Signature will be truncated to (2^16 - 1) bytes (~64KB). Message size is dynamic, 7 bytes plus the size of the signature.

Map Exception Message

Description

The map exception data message tells HQ the ID given for a particular exception type. IDs must be unique, however, more than one ID may be mapped to the same type. The reason for allowing multiple IDs per type is in consideration of memory management, such that old mappings may be flushed from memory in the agent under memory pressure.

Format
[1 byte: type ID (12)][4 bytes: assigned exception ID][2 bytes: length of encoded exception][n bytes: encoded exception]

The message type ID for this message is 12. Message content will be the 32-bit ID assigned to the exception, followed by the exception itself (prefixed by the exception's length, in bytes, as a 16-bit unsigned value). Exception name will be truncated to (2^16 - 1) bytes (~64KB). Message size is dynamic, 7 bytes plus the size of the signature.

Method Entry Message

Description

The method entry data message tells HQ a method entry has occurred.

Format
[1 byte: type ID (20)][4 bytes: relative timestamp][4 bytes: current sequence][4 bytes: method signature ID][2 bytes: thread ID]

The message type ID for this message is 20. Message content will be the timestamp offset (32-bit unsigned, milliseconds since start of tracing), followed by the current sequence ID (32-bit), followed by the method signature ID, followed by the thread ID. Message size is fixed at 15 bytes.

Method Exit Message

Description

The method exit data message tells HQ a method exit has occurred.

Format
[1 byte: type ID (21)][4 bytes: relative timestamp][4 bytes: current sequence][4 bytes: method signature ID][2 bytes: line number][2 bytes: thread ID]

The message type ID for this message is 21. Message content will be the timestamp offset (32-bit unsigned, milliseconds since start of tracing), followed by the current sequence ID (32-bit), followed by the method signature ID, followed by the source line number (16-bit unsigned), followed by the thread ID. Message size is fixed at 17 bytes.

Exception Message

Description

The exception data message tells HQ an exception has occurred.

Format
[1 byte: type ID (22)][4 bytes: relative timestamp][4 bytes: current sequence][4 bytes: method signature ID][4 bytes: exception ID][2 bytes: line number][2 bytes: thread ID]

The message type ID for this message is 22. Message content will be the timestamp offset (32-bit unsigned, milliseconds since start of tracing), followed by the current sequence ID (32-bit), followed by the method signature ID, followed by the exception ID, followed by the source line number (16-bit unsigned), followed by the thread ID. Message size is fixed at 21 bytes.

Exception Bubble Message

Description

The exception bubble message tells HQ that a method has exited by way of bubbling an exception. The exception bubbled would be the last one thrown on the current thread.

Format
[1 byte: type ID (23)][4 bytes: relative timestamp][4 bytes: current sequence][4 bytes: method signature ID][4 bytes: exception ID][2 bytes: thread ID]

The message type ID for this message is 23. Message content will be the timestamp offset (32-bit unsigned, milliseconds since start of tracing), followed by the current sequence ID (32-bit), followed by the method signature ID, followed by the exception ID, followed by the thread ID. Message size is fixed at 15 bytes.

Marker Message

Description

A marker is sent to notify HQ of some custom event that happened in the traced application.

Format
[1 byte: type ID (50)][4 bytes: relative timestamp][4 bytes: current sequence][2 bytes: key string length = K][K bytes: key string utf8][2 bytes value string length = V][V bytes: value string utf8]

The message type ID for this message is 50. Message content will be the timestamp offset (32-bit unsigned, milliseconds since start of tracing), followed by the current sequence ID (32-bit), followed by the length of the "key" string (16-bit unsigned) and the utf8 encoded bytes of the "key" string itself, followed by the "value" string in the same format as the "key" string. Message length will be 13 bytes plus K plus V, where K is the length of the "key" string, and V is the length of the "value" string.

Expected Message Responses / Side Effects

Messages to HQ
Message Expected Result
hello configuration response or dropped connection
data hello data hello reply message in response
heartbeat HQ knows of agent's status (missed heartbeats may be treated as a failure by HQ); no response expected
data break informational event; no response expected
error HQ is alerted that an error occurred; all sockets shut down and agent dies
class transformed informational event; no response expected
class ignored informational event; no response expected
class transform failed informational event; no response expected
map thread name data event; no response expected
map method signature data event; no response expected
map exception data event; no response expected
method entry data event; no response expected
method exit data event; no response expected
exception data event; no response expected
exception bubble data event; no response expected
marker data event; no response expected
Messages from HQ
Message Expected Result
configuration configuration applied - okay to open data sockets and send data; no response expected
data hello reply data connection is recognized - okay to send data; no response expected
start tracee program begins execution, trace data begins to flow to HQ if not suspended
stop tracee program ends execution, no new trace data collected; any buffered data continues to be sent
pause execution of tracee program is paused ASAP
unpause execution of tracee program resumes
suspend agent stops collecting trace data ASAP; any buffered data continues to be sent
unsuspend agent begins collecting trace data ASAP
error agent is alerted that an error occurred; all sockets shut down and agent dies

Chain of Events

Chain of events for control connection
  1. TCP connection is established.
  2. Agent sends hello message to HQ, specifying version of protocol it speaks.
  3. If HQ understands the version specified, and chooses to continue, it sends a configuration message in return containing whatever configuration values are to be used for this run. Otherwise, it sends an error message and drops the connection.
  4. HQ sends suspend message if tracing should not begin immediately upon program start.
  5. HQ sends start message, and the agent allows execution to begin.
  6. Agent sends trace messages as appropriate, and at some interval, a heartbeat message, to HQ.
  7. At or near the end of execution, agent starts reporting state as shutting down in heartbeats to HQ.
Chain of events for a data connection
  1. TCP connection is established.
  2. Agent sends data hello message to HQ, specifying the run ID the connection is providing data for.
  3. If HQ recognizes the run ID, it sends a data hello reply message in return, otherwise, it sends an error message and drops the connection.
  4. Agent begins sending data messages over the connection.
Chain of events for a data event

These steps are bypassed if agent is suspended. These steps block until an unpause message is received, if agent is paused. All events happen agent-side.

  1. If thread name has changed (or hasn't been sent yet), send a map thread name message.
  2. If there is no cached ID for the method signature, generate a new ID and send a map method signature message.
  3. Send the appropriate method entry/exit or exception message.
Clone this wiki locally