Commit 10301be
authored
packrat ereport storage and snitch implementation (#2126)
Things fail.
Not finding out about it sucks.
This branch implements the Hubris side of the ereport ingestion system,
as described [RFD 545]. Work on this was started by @cbiffle in #2002,
which implemented the core ring-buffer data structure used to store
ereports. Meanwhile, oxidecomputer/management-gateway-service#370,
oxidecomputer/omicron#7803, and oxidecomputer/omicron#8296 added
the MGS and Omicron components of this system.
This branch picks up where Cliff left off, and "draws the rest of the
owl" by implementing the aggregation of ereports in the `packrat` task
using this data structure, and adding a new `snitch` task, which acts as
a proxy to allow ereports stored by `packrat` to be read over the
management network.
## Architecture
Ereports are stored by `packrat` because we would like as many tasks as
possible to be able to report errors by making IPC call to the task
responsible for ereport storage. This means that the task aggregating
ereports must be a high-priority task, so that as many other tasks as
possible may be its clients. Additionally, we would like to include the
system's VPD identity as metadata for ereports, and this data is already
stored by packrat. Finally, we would like to minimize the likelihood of
the task that stores ereports crashing, as this would result in data
loss, and packrat already is expected not to crash.
On the other hand, the task that actually evacuates these ereports over
the management network must run at a priority lower than that of the
`net` task, of which it is a client. Thus the separation of
responsibilities between `packrat` and the `snitch`. The snitch task is
fairly simple. It receives packets sent to the ereport socket,
interprets the request message, and forwards the request to packrat. Any
ereports sent back by packrat are sent in response to the request. The
snitch ends up being a pretty dumb, stateless proxy: as the response
packet is encoded by packrat; all we end up doing is taking the bytes
received from packrat and stuffing them into the socket's send queue.
The real purpose of this thing is just to serve as a trampoline between
the high priority level of packrat and a priority level lower than that
of the net task.
## `snitch-core` Fixes
While testing behavior when the ereport buffer is full, I found a
potential panic in the existing `snitch-core` code. Previously, every
time ereports are read from the buffer while it is in the `Losing` state
(i.e., ereports have been discarded because the buffer was full),
`snitch-core` attempts to insert a new loss record at the end of the
buffer (calling `recover_if_needed()`). This ensures that the data loss
is reported to the reader ASAP. The problem is that this code assumed
that there would always be space for an additional loss record, and
panicked if it didn't fit. I added a test reproducing this panic in
ff93754, and fixed it in
22044d1 by changing the calculation of
whether recovery is possible.
When `recover_if_needed` is called while in the `Losing` state, we call
the `free_space()` method to determine whether we can recover. In the
`Losing` state, [this method would calculate the free space by
subtracting the space required for the loss record][1] that must be
encoded to transition out of the `Losing` state. However, in the case
where `recover_if_required()` is called with `required_space: None`
(which indicates that we're not trying to recover because we want to
insert a new record, but just because we want to report ongoing data
loss to the caller), [we check that the free space is greater than or
equal to 0][2]. This means that we would still try to insert a loss
record even if the free space was 0, resulting in a panic. I've fixed
this by moving the check that there's space for a loss record out of the
calculation of `free_space()` and into the _required_ space, in addition
to the requested value (which is 0 in the "we are inserting the loss
record to report loss" case). This way, we only insert the loss record
if it fits, which is the correct behavior.
I've also changed the assignment of ENAs in `snitch-core` to start at 1,
rather than 0, since ENA 0 is reserved in the wire protocol to indicate
"no ENA". In the "committed ENA" request field this means "don't flush
any ereports", and in the "start ENA" response field, ENA 0 means "no
ereports in this packet". Thus, the ereport store must start assigning
ENAs at ENA 1 for the initial loss record.
## Testing
Currently, no tasks actually produce ereports. To test that everything
works correctly, it was necessary to add a source of ereports, so I've
added [a little task][3] that just generates test ereports when asked
via `hiffy`. I've included some of that in [this comment][4]. This was
also used for testing the data-loss behavior discussed above.
[RFD 545]: https://rfd.shared.oxide.computer/rfd/0545
[1]:
https://github.com/oxidecomputer/hubris/blob/e846b9d2481b13cf2b18a2a073bb49eef5f654de/lib/snitch-core/src/lib.rs#L110-L121
[2]:
https://github.com/oxidecomputer/hubris/blob/e846b9d2481b13cf2b18a2a073bb49eef5f654de/lib/snitch-core/src/lib.rs#L297-L300
[3]: https://github.com/oxidecomputer/hubris/blob/864fa57a7c34a6225deddcffa0c7d54c3063eab6/task/ereportulator/src/main.rs1 parent b35abba commit 10301be
File tree
31 files changed
+1843
-339
lines changed- app
- cosmo
- demo-stm32h7-nucleo
- gimletlet
- gimlet
- grapefruit
- psc
- sidecar
- drv/stm32h7-rng
- src
- idl
- lib/snitch-core
- src
- sys/abi/src
- task
- ereportulator
- src
- packrat-api
- src
- packrat
- src
- snitch
- src
31 files changed
+1843
-339
lines changedSome generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
| 95 | + | |
95 | 96 | | |
96 | 97 | | |
97 | 98 | | |
| |||
144 | 145 | | |
145 | 146 | | |
146 | 147 | | |
| 148 | + | |
147 | 149 | | |
148 | 150 | | |
149 | 151 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
122 | 122 | | |
123 | 123 | | |
124 | 124 | | |
| 125 | + | |
125 | 126 | | |
126 | 127 | | |
127 | 128 | | |
128 | | - | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
129 | 139 | | |
130 | 140 | | |
131 | 141 | | |
| |||
347 | 357 | | |
348 | 358 | | |
349 | 359 | | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
350 | 371 | | |
351 | 372 | | |
352 | 373 | | |
| |||
1456 | 1477 | | |
1457 | 1478 | | |
1458 | 1479 | | |
| 1480 | + | |
| 1481 | + | |
| 1482 | + | |
| 1483 | + | |
| 1484 | + | |
| 1485 | + | |
| 1486 | + | |
| 1487 | + | |
| 1488 | + | |
1459 | 1489 | | |
1460 | 1490 | | |
1461 | 1491 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
71 | 71 | | |
72 | 72 | | |
73 | 73 | | |
74 | | - | |
75 | 74 | | |
76 | 75 | | |
77 | 76 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
123 | 123 | | |
124 | 124 | | |
125 | 125 | | |
| 126 | + | |
126 | 127 | | |
127 | 128 | | |
128 | 129 | | |
129 | | - | |
| 130 | + | |
130 | 131 | | |
131 | 132 | | |
132 | 133 | | |
| |||
194 | 195 | | |
195 | 196 | | |
196 | 197 | | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
197 | 207 | | |
198 | 208 | | |
199 | 209 | | |
| |||
336 | 346 | | |
337 | 347 | | |
338 | 348 | | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
339 | 360 | | |
340 | 361 | | |
341 | 362 | | |
| |||
1315 | 1336 | | |
1316 | 1337 | | |
1317 | 1338 | | |
| 1339 | + | |
| 1340 | + | |
| 1341 | + | |
| 1342 | + | |
| 1343 | + | |
| 1344 | + | |
| 1345 | + | |
| 1346 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
46 | | - | |
| 45 | + | |
47 | 46 | | |
48 | 47 | | |
49 | 48 | | |
| 49 | + | |
| 50 | + | |
50 | 51 | | |
51 | 52 | | |
52 | 53 | | |
| |||
185 | 186 | | |
186 | 187 | | |
187 | 188 | | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
188 | 204 | | |
189 | 205 | | |
190 | 206 | | |
| |||
233 | 249 | | |
234 | 250 | | |
235 | 251 | | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
236 | 261 | | |
237 | 262 | | |
238 | 263 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
108 | 108 | | |
109 | 109 | | |
110 | 110 | | |
111 | | - | |
112 | | - | |
| 111 | + | |
113 | 112 | | |
114 | 113 | | |
115 | 114 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
90 | 90 | | |
91 | 91 | | |
92 | 92 | | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
93 | 101 | | |
94 | 102 | | |
95 | 103 | | |
| |||
109 | 117 | | |
110 | 118 | | |
111 | 119 | | |
112 | | - | |
113 | | - | |
| 120 | + | |
| 121 | + | |
114 | 122 | | |
115 | 123 | | |
116 | 124 | | |
| 125 | + | |
117 | 126 | | |
118 | 127 | | |
119 | 128 | | |
| |||
313 | 322 | | |
314 | 323 | | |
315 | 324 | | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
316 | 336 | | |
317 | 337 | | |
318 | 338 | | |
| |||
535 | 555 | | |
536 | 556 | | |
537 | 557 | | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
0 commit comments