Skip to content

pidfile.t: PreDelete kills container that may have already exited, causing false failure #805

@skeptomai

Description

@skeptomai

Summary

pidfile.t fails against any spec-compliant runtime because it calls Kill("KILL") on a container that may have already stopped — and the OCI spec requires that kill on a stopped container MUST return an error.

Reproduction

Run validation/pidfile/pidfile.t against any OCI-compliant runtime (runc, remora, youki). The test container runs true, which exits immediately after start. By the time PreDelete runs, the container is in stopped state. Kill("KILL") returns a non-zero exit code, and PreDelete returns that error, failing the TAP assertion.

Root cause

In validation/pidfile/pidfile.go, the PreDelete callback:

PreDelete: func(r *util.Runtime) error {
    util.WaitingForStatus(*r, util.LifecycleStatusRunning, time.Second*10, time.Second*1)
    err = r.Kill("KILL")
    util.WaitingForStatus(*r, util.LifecycleStatusStopped, time.Second*10, time.Second*1)
    return err  // Kill error propagated as test failure
},

WaitingForStatus(..., LifecycleStatusRunning, ...) polls for up to 10s but the container has already exited — the timeout error is silently discarded. Kill("KILL") is then called on a stopped container. Per OCI Runtime Spec:

If the state of the container is not created or running, the runtime MUST generate an error.

Compliant runtimes return an error. PreDelete returns that error and the test fails.

Impact

This is a test harness bug. The actual pid-file assertion is not even what's failing — it's the cleanup logic. Any runtime that correctly implements the kill error for stopped containers will fail this test.

Confirmed with: remora (Rust), reproduced logically against runc (runc also returns container not running on kill of stopped).

Fix

Ignore the Kill error when the container may already be stopped:

PreDelete: func(r *util.Runtime) error {
    // Container may have already exited (e.g. process is "true").
    // Kill on a stopped container is an error per spec; don't propagate it.
    r.Kill("KILL")
    util.WaitingForStatus(*r, util.LifecycleStatusStopped, time.Second*10, time.Second*1)
    return nil
},

Or: check whether WaitingForStatus(Running) succeeded before calling Kill.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions