Skip to content

Commit 50554e9

Browse files
committed
acceptance: make docker more resilient to timeout in ContainerStart
Docker likes to never respond to us, and we do not usually have cancellations on the context (which would not help, after all, that would just fail the test right there). Instead, try a few times. The problem looks similar to golang/go#16060 golang/go#5103 Another possibility mentioned in usergroups is that some file descriptor limit is hit. Since I've never seen this locally, perhaps that's the case on our agent machines. Unfortunately, those are hard to SSH into. This may not be a good idea (after all, perhaps `Start()` succeeded) and we'd have to do something similar for `ContainerWait`. But, at least it should give us an additional data point: do the retries also just block? Is the container actually started when we retry?
1 parent b085a91 commit 50554e9

File tree

1 file changed

+21
-0
lines changed

1 file changed

+21
-0
lines changed

pkg/acceptance/cluster/docker.go

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -320,6 +320,27 @@ type resilientDockerClient struct {
320320
client.APIClient
321321
}
322322

323+
func (cli resilientDockerClient) ContainerStart(
324+
clientCtx context.Context, id string, opts types.ContainerStartOptions,
325+
) error {
326+
for {
327+
err := func() error {
328+
ctx, cancel := context.WithTimeout(clientCtx, 20*time.Second)
329+
defer cancel()
330+
331+
return cli.APIClient.ContainerStart(ctx, id, opts)
332+
}()
333+
334+
// Keep going if ContainerStart timed out, but client's context is not
335+
// expired.
336+
if err == context.DeadlineExceeded && clientCtx.Err() == nil {
337+
log.Warningf(clientCtx, "ContainerStart timed out, retrying")
338+
continue
339+
}
340+
return err
341+
}
342+
}
343+
323344
func (cli resilientDockerClient) ContainerCreate(
324345
ctx context.Context,
325346
config *container.Config,

0 commit comments

Comments
 (0)