Reduce nonce redemption flake by giving WFE time to mark SubConns READY #8442
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In noncebalancer, add documentation and change names to make the roles played by each type clearer. Unexport the pickerBuilder and picker types, since they aren't directly referenced anywhere outside of the package's init function.
In the WFE, move the nonceWellFormed error message upwards into validNonce, alongside the other errors returned by that function. Change that same error message to say "malformed" rather than "invalid", to differentiate it from redemption failures and to match the corresponding metric label. Replace the JWSInvalidNonce metric label with two more-specific metric labels JWSNoBackendNonce and JWSExpiredNonce, for better insight into whether nonce redemption failures are due to backends shutting down or due to backends expiring old nonces.
Finally, in the python integration tests, increase how long we wait between retries from 10ms to (up to) 600ms. This gives the WFE's NonceRedeemer gRPC client enough time to move its SubConns from the CONNECTING state to the READY state, and in practice seems to eliminate flaky nonce redemption errors in CI.
Fixes #8385