Open
Description
There are some basic things we'll want to check everywhere (e.g., Nexus, Sled Agent, DNS servers, etc.) for availability:
- TCP KeepAlive: want to enable this on all network connections (in both directions) to identify failed systems. external vs. internal should probably have different values.
- HTTP KeepAlive: probably want to just pick a value like 60 seconds. Consider having clients make dummy requests to keep the connections open? (to avoid the problem of picking a connection that's been open for just under 60 seconds, sending a request, and having the server slam the door in your face -- we ran into this with Manta, admittedly only at very large scale since it's fairly improbable)
We'll want to review these, too. They might be more security-related (see #2184):
- limits for bad client behavior:
- maximum time waiting for a client to send request headers (whether on a new connection or between requests)
- minimum flow rate for request bodies (can be fairly low -- just want to avoid clients dribbling data in as a DoS vector to keep connections open)
- maximum number of open connections (ideally limited separately for different APIs -- e.g., external vs. internal)
- TCP listen socket backlog
- maximum rate of new connections created [ideally per-client]
- maximum rate of incoming requests [per authenticated user? or IP?, as well as overall]
- maximum number of connect-in-progress sockets
- maximum number of TLS-session-establishment-in-progress sockets
- size of tokio worker thread pool, blocked thread pool
- maximum length of time that graceful server shutdown can take
Metadata
Metadata
Assignees
Labels
No labels