-
Notifications
You must be signed in to change notification settings - Fork 5
Message Protocol
Bytefrog collects tracing data by passing messages over TCP between the tracer agent and the data collector/processor (HQ). To keep messages simple and compact, messages are packed binary data (rather than textual). This helps keep bandwidth and memory usage low. Each message will have a one byte message type header, followed by message contents as defined for each message type. There is no delimiter between messages.
Any string values sent (such as method signatures) shall be sent in a consistent encoding. Encoding shall be Java modified UTF-8, as written by DataOutputStream
. Byte sizes sent shall be actual sizes (determined after encoding the string), and not just the string length.
Multi-byte values shall be represented using big-endian (traditional network byte ordering). Keep this in mind, as x86 and x86-64 are little-endian. Java should provide functions for converting values if required, so we don't need to worry about endianness.
Per the message queueing system spec, a DataOutputStream
will be provided for writing the message to. DataOutputStream
handles all endianness, string encoding, and length prefixing concerns for us. This makes encoding messages per this protocol easy and with relatively little overhead.
ID | Message | Size | Response |
---|---|---|---|
0 | hello | 2 bytes | configuration |
1 | configuration | 5 bytes + n | none |
2 | start | 1 byte | none |
3 | stop | 1 byte | none |
4 | pause | 1 byte | none |
5 | unpause | 1 byte | none |
6 | suspend | 1 byte | none |
7 | unsuspend | 1 byte | none |
8 | heartbeat | 4 bytes | none |
9 | data break | 5 bytes | none |
10 | map thread name | 9 bytes + n | none |
11 | map method signature | 7 bytes + n | none |
12 | map exception | 7 bytes + n | none |
20 | method entry | 15 bytes | none |
21 | method exit | 17 bytes | none |
22 | exception | 21 bytes | none |
23 | exception bubble | 19 bytes | none |
30 | data hello | 2 bytes | data hello reply |
31 | data hello reply | 1 byte | none |
40 | class transformed | 3 bytes + n | none |
41 | class ignored | 3 bytes + n | none |
42 | class transform failed | 3 bytes + n | none |
50 | marker | 13 bytes + K + V | none |
99 | error | 3 bytes + n | none |
Control messages provide status and control communication between HQ and the agent. These are small, time-sensitive messages that shall be sent in a prioritized fashion (i.e., bypassing any sort of send queue containing data messages).
Control messages are sent over the first connection (the one you send hello
on).
The hello
control message is sent from agent to HQ immediately upon connection. No other communications are valid until after this message is sent.
This message is only sent from the first connection. If multiple streams are used for sending data, use the data hello
message for the other streams.
This message carries a base timestamp. Offset timestamps are used for all events, so this message specifies the base time.
This message carries a version number as well, to allow changes or extensions to be made to the protocol in the future. Currently, the version number is 1. This number is incremented whenever breaking changes are made, so HQ can make sure to handle messages properly.
[1 byte: type ID (0)][1 byte: protocol version]
The message type ID for this message is 0
. The message contains an 8-bit unsigned value specifying the protocol version. Message size is fized at 2 bytes.
The data hello
control message is sent from agent to HQ on data connections. This message replaces the hello
message on all but the first connection when multiple streams are used.
Additional connections and the sending of data hello
should not occur until after configuration
is received on the first connection. Any configuration relating to data streams should be taken from that configuration.
[1 byte: type ID (30)][1 byte: run ID]
The message type ID for this message is 30
. The message contains an 8-bit unsigned value specifying the run ID this connection should be associated with. Message size is fixed at 2 bytes.
The error
message is sent by either end to report a fatal error. This message is provided as a best-effort error reporting service so there's a chance of logging a failure. Obviously, if the control connection is compromised, this message may or may not make it through.
The expectation is that the trace dies sometime afterward, since a fatal error has occurred.
[1 byte: type ID (99)][2 bytes: length of error message][n bytes: error message]
The message type ID for this message is 99
. The message consists of the provided error, prefixed by the length as a 16-bit unsigned value. Message size is dynamic, 3 bytes plus the size of the error message.
The configuration
control message is sent from HQ to the agent. It contains the configuration we need to pass to the agent. This will occur after the hello
message is received, but before any other control messages (i.e., start
or suspend
) are sent. Agent must not send any message (other than hello
) before this message is received, and HQ must send configuration
immediately after receiving the hello
message if the connection is to continue.
This message will not be sent in reply to data hello
messages.
Configuration will contain the run ID, queue configuration information, data socket information, how often to send a heartbeat, chunk size for sending messages (argument to drainTo()
), and whatever other information is necessary for the agent to run.
[1 byte: type ID (1)][4 bytes: length, in bytes, of configuration data][n bytes: encoded configuration data]
The message type ID for this message is 1
. The message contents consist of a configuration object encoded by binary serialization, prefixed by the length of the configuration data, in bytes (32-bit unsigned). Message size is dynamic, 5 bytes plus the size of the configuration data.
Implementation note: configuration uses a 32-bit length prefix, to allow for configurations greater than 64KB in size. Encoding and length-prefixing will need to be done manually, as DataOutputStream
uses 16-bit length prefixes.
The data hello reply
control message is sent from HQ to the agent, in reply to the data hello
message. This message simply signals that HQ is acknowledging and ready to begin receiving data on the stream.
[1 byte: type ID (31)]
The message type ID for this message is 31
. There are no message contents. Message size is fixed at 1 byte.
The start
control message is sent from HQ to the agent to signal that we are ready to begin tracing. Upon receipt of this message, the agent will stop waiting (presumably in premain) and allow the program to start running.
The start
message is here to allow HQ the chance to send configuration data, suspend tracing, and perform whatever other initialization necessary before the program actually begins.
Two modes of operation were discussed at one point - where you're either collecting from the start, or waiting for further word before traces are collected. Implementation of this will be: if you don't want to collect from the start of the program, HQ shall send a suspend
message before it sends a start
message. Until/unless told otherwise by HQ (by means of suspend
or pause
messages), the default state of the agent upon receiving the start
message shall be running as normal (not suspended, not paused).
[1 byte: type ID (2)]
The message type ID for this message is 2
. There are no message contents. Message size is fixed at 1 byte.
The stop
control message is sent from HQ to the agent to signal that we are done tracing. Upon receipt of this message, the agent should stop execution of the program.
[1 byte: type ID (3)]
The message type ID for this message is 3
. There are no message contents. Message size is fixed at 1 byte.
The pause
control message is sent from HQ to the agent to signal that program execution should be paused. Upon receipt of this message, agent shall pause tracee program execution as quickly as possible.
Data loss shall not occur as a result of a pause
message.
[1 byte: type ID (4)]
The message type ID for this message is 4
. There are no message contents. Message size is fixed at 1 byte.
The unpause
control message is sent from HQ to the agent to signal program execution shall resume.
[1 byte: type ID (5)]
The message type ID for this message is 5
. There are no message contents. Message size is fixed at 1 byte.
The suspend
control message is sent from HQ to the agent to signal that HQ is not interested in receiving trace data at the moment. Upon receipt of this message, the agent shall stop collecting trace data as quickly as it can.
Data loss is expected as a result of a suspend
message.
[1 byte: type ID (6)]
The message type ID for this message is 6
. There are no message contents. Message size is fixed at 1 byte.
The unsuspend
control message is sent from HQ to the agent to signal that HQ is interested in now receiving trace data. Upon receipt of this message, the agent shall start collecting and sending trace data as quickly as it can.
[1 byte: type ID (7)]
The message type ID for this message is 7
. There are no message contents. Message size is fixed at 1 byte.
The heartbeat
control message is sent from the agent to HQ to report that the agent is alive and kicking. The message will the current mode of operation as well as the current send buffer size. This is an informational message, HQ will specify how often this message is to be sent, and may take action or assume failure after some number of heartbeat messages are missed.
[1 byte: type ID (8)][1 byte: mode of operation][2 bytes: send buffer size]
The message type ID for this message is 8
. The message contents consist of the mode of operation (8-bit unsigned, see list below) followed by the current send buffer size (16-bit unsigned). Message size is fixed at 4 bytes.
- Initializing = 'I' (73)
- Tracing = 'T' (84)
- Paused = 'P' (80)
- Suspended = 'S' (83)
- Shutting down = 'X' (88)
The class transformed
message is sent from the agent to HQ to report that the agent has transformed a class in the target application. This is an informational message, allowing HQ's UI to display a list of transformed classes on the fly.
[1 byte: type ID (40)][2 bytes: length of qualified class name][n bytes: qualified class name]
The message type ID for this message is 40
. The message contents consist of the fully-qualified name of the transformed class (prefixed by the name's length, as a 16-bit unsigned value). Message size is dynamic, 3 bytes plus the size of the class name.
The class ignored
message is sent from the agent to HQ to report that the agent has gone through the motions of transforming a class in the target application, but chose to leave that class intact and unchanged. This is an informational message, allowing HQ's UI to display a list of non-transformed classes on the fly.
[1 byte: type ID (41)][2 bytes: length of qualified class name][n bytes: qualified class name]
The message type ID for this message is 41
. The message contents consist of the fully-qualified name of the ignored class (prefixed by the name's length, as a 16-bit unsigned value). Message size is dynamic, 3 bytes plus the size of the class name.
The class transform failed
message is sent from the agent to HQ to report that the agent has tried and failed to transform a class in the target application. This is an informational message, allowing HQ's UI to display a list of classes that could not be transformed.
[1 byte: type ID (42)][2 bytes: length of qualified class name][n bytes: qualified class name]
The message type ID for this message is 42
. The message contents consist of the fully-qualified name of the ignored class, prefixed by the name's length as a 16-bit unsigned value. Message size is dynamic, 3 bytes plus the size of the class name.
The data break
message is sent from the agent to HQ to report that there is a break in the data flow before the reported sequence ID. This typically will mean that tracing was suspended, and that any inferred call stack state/etc should be discarded when data reaches that sequence ID.
[1 byte: type ID (9)][4 bytes: sequence ID for first message following the data break]
The message type ID for this message is 9
. The message contents consists solely of the next sequence ID after the data break. Message size is fixed at 5 bytes.
Data messages provide the actual trace data. These are always generated by the agent and sent to HQ. Data messages are lower priority than control messages, so if any control messages are queued, they must all be sent before any further data messages.
The trace data sent consists of method entries, method exits, and exceptions. For all events, information about the current thread (thread name, and a 16-bit unsigned unique thread ID) and source location (line number, if available). For method events, the method signature is sent, and for exception events, the exception type name is sent. Thread names are mapped to the unique thread ID, and method signatures are mapped to a unique 32-bit unsigned value.
Data messages are sent over data connections (ones you send data hello
on).
The map thread name
data message tells HQ the name for a thread. Thread names may change during execution, so new map thread name
messages will need to be generated if the thread name has changed.
To track changing thread names over time, this message is timestamped.
A simple implementation would be to keep track of the last sent thread name, as well as the unique thread ID, in thread local storage (our AspectJ tracer implementation already handles unique thread IDs stored thread local). On every data event, check if the cached thread name matches the actual, and when it doesn't, emit a map thread name
message and update the thread local value.
[1 byte: type ID (10)][2 bytes: thread ID][4 bytes: relative timestamp][2 bytes: length of encoded thread name][n bytes: encoded thread name]
The message type ID for this message is 10
. Message content will be the 16-bit thread ID, followed by a timestamp offset (32-bit unsigned, milliseconds since start of tracing), followed by the thread name (prefixed by the name's length, in bytes, as a 16-bit unsigned value). Thread name will be truncated to (2^16 - 1) bytes (~64KB). Message size is dynamic, 9 bytes plus the size of the thread name.
The map method signature
data message tells HQ the ID given for a particular method signature. IDs must be unique, however, more than one ID may be mapped to the same signature. The reason for allowing multiple IDs per signature is in consideration of memory management, such that old mappings may be flushed from memory in the agent under memory pressure.
[1 byte: type ID (11)][4 bytes: assigned signature ID][2 bytes: length of encoded signature][n bytes: encoded signature]
The message type ID for this message is 11
. Message content will be the 32-bit ID assigned to the signature, followed by the signature itself (prefixed by the signature's length, in bytes, as a 16-bit unsigned value). Signature will be truncated to (2^16 - 1) bytes (~64KB). Message size is dynamic, 7 bytes plus the size of the signature.
The map exception
data message tells HQ the ID given for a particular exception type. IDs must be unique, however, more than one ID may be mapped to the same type. The reason for allowing multiple IDs per type is in consideration of memory management, such that old mappings may be flushed from memory in the agent under memory pressure.
[1 byte: type ID (12)][4 bytes: assigned exception ID][2 bytes: length of encoded exception][n bytes: encoded exception]
The message type ID for this message is 12
. Message content will be the 32-bit ID assigned to the exception, followed by the exception itself (prefixed by the exception's length, in bytes, as a 16-bit unsigned value). Exception name will be truncated to (2^16 - 1) bytes (~64KB). Message size is dynamic, 7 bytes plus the size of the signature.
The method entry
data message tells HQ a method entry has occurred.
[1 byte: type ID (20)][4 bytes: relative timestamp][4 bytes: current sequence][4 bytes: method signature ID][2 bytes: thread ID]
The message type ID for this message is 20
. Message content will be the timestamp offset (32-bit unsigned, milliseconds since start of tracing), followed by the current sequence ID (32-bit), followed by the method signature ID, followed by the thread ID. Message size is fixed at 15 bytes.
The method exit
data message tells HQ a method exit has occurred.
[1 byte: type ID (21)][4 bytes: relative timestamp][4 bytes: current sequence][4 bytes: method signature ID][2 bytes: line number][2 bytes: thread ID]
The message type ID for this message is 21
. Message content will be the timestamp offset (32-bit unsigned, milliseconds since start of tracing), followed by the current sequence ID (32-bit), followed by the method signature ID, followed by the source line number (16-bit unsigned), followed by the thread ID. Message size is fixed at 17 bytes.
The exception
data message tells HQ an exception has occurred.
[1 byte: type ID (22)][4 bytes: relative timestamp][4 bytes: current sequence][4 bytes: method signature ID][4 bytes: exception ID][2 bytes: line number][2 bytes: thread ID]
The message type ID for this message is 22
. Message content will be the timestamp offset (32-bit unsigned, milliseconds since start of tracing), followed by the current sequence ID (32-bit), followed by the method signature ID, followed by the exception ID, followed by the source line number (16-bit unsigned), followed by the thread ID. Message size is fixed at 21 bytes.
The exception bubble
message tells HQ that a method has exited by way of bubbling an exception. The exception bubbled would be the last one thrown on the current thread.
[1 byte: type ID (23)][4 bytes: relative timestamp][4 bytes: current sequence][4 bytes: method signature ID][4 bytes: exception ID][2 bytes: thread ID]
The message type ID for this message is 23
. Message content will be the timestamp offset (32-bit unsigned, milliseconds since start of tracing), followed by the current sequence ID (32-bit), followed by the method signature ID, followed by the exception ID, followed by the thread ID. Message size is fixed at 15 bytes.
A marker
is sent to notify HQ of some custom event that happened in the traced application.
[1 byte: type ID (50)][4 bytes: relative timestamp][4 bytes: current sequence][2 bytes: key string length = K][K bytes: key string utf8][2 bytes value string length = V][V bytes: value string utf8]
The message type ID for this message is 50
. Message content will be the timestamp offset (32-bit unsigned, milliseconds since start of tracing), followed by the current sequence ID (32-bit), followed by the length of the "key" string (16-bit unsigned) and the utf8 encoded bytes of the "key" string itself, followed by the "value" string in the same format as the "key" string. Message length will be 13 bytes plus K
plus V
, where K
is the length of the "key" string, and V
is the length of the "value" string.
Message | Expected Result |
---|---|
hello | configuration response or dropped connection |
data hello | data hello reply message in response |
heartbeat | HQ knows of agent's status (missed heartbeats may be treated as a failure by HQ); no response expected |
data break | informational event; no response expected |
error | HQ is alerted that an error occurred; all sockets shut down and agent dies |
class transformed | informational event; no response expected |
class ignored | informational event; no response expected |
class transform failed | informational event; no response expected |
map thread name | data event; no response expected |
map method signature | data event; no response expected |
map exception | data event; no response expected |
method entry | data event; no response expected |
method exit | data event; no response expected |
exception | data event; no response expected |
exception bubble | data event; no response expected |
marker | data event; no response expected |
Message | Expected Result |
---|---|
configuration | configuration applied - okay to open data sockets and send data; no response expected |
data hello reply | data connection is recognized - okay to send data; no response expected |
start | tracee program begins execution, trace data begins to flow to HQ if not suspended |
stop | tracee program ends execution, no new trace data collected; any buffered data continues to be sent |
pause | execution of tracee program is paused ASAP |
unpause | execution of tracee program resumes |
suspend | agent stops collecting trace data ASAP; any buffered data continues to be sent |
unsuspend | agent begins collecting trace data ASAP |
error | agent is alerted that an error occurred; all sockets shut down and agent dies |
- TCP connection is established.
- Agent sends
hello
message to HQ, specifying version of protocol it speaks. - If HQ understands the version specified, and chooses to continue, it sends a
configuration
message in return containing whatever configuration values are to be used for this run. Otherwise, it sends anerror
message and drops the connection. - HQ sends
suspend
message if tracing should not begin immediately upon program start. - HQ sends
start
message, and the agent allows execution to begin. - Agent sends trace messages as appropriate, and at some interval, a
heartbeat
message, to HQ. - At or near the end of execution, agent starts reporting state as shutting down in heartbeats to HQ.
- TCP connection is established.
- Agent sends
data hello
message to HQ, specifying the run ID the connection is providing data for. - If HQ recognizes the run ID, it sends a
data hello reply
message in return, otherwise, it sends anerror
message and drops the connection. - Agent begins sending data messages over the connection.
These steps are bypassed if agent is suspended. These steps block until an unpause
message is received, if agent is paused. All events happen agent-side.
- If thread name has changed (or hasn't been sent yet), send a
map thread name
message. - If there is no cached ID for the method signature, generate a new ID and send a
map method signature
message. - Send the appropriate method entry/exit or exception message.