Skip to content

Intermittent segmentation fault in terraform binary when running many UnitTest instances in parallel #659

@SebTardif

Description

@SebTardif

Module version

  • terraform-plugin-testing v1.16.0
  • terraform-exec v0.25.1 (transitive)
  • terraform-plugin-framework v1.19.0
  • Go 1.26.3 / darwin/arm64
  • Terraform CLI v1.15.4

Description

When running a large number of resource.UnitTest tests in parallel across multiple packages (e.g., go test -race ./...), the terraform binary invoked by terraform-exec intermittently crashes with a segmentation fault. The crash manifests at different points in the test lifecycle:

  1. During working directory setup:

    failed to create new working directory: unable to disable terraform-exec provider verification: signal: segmentation fault
    
  2. During post-apply plan reading:

    Step 1/1 error: Error reading saved post-apply non-refresh plan: signal: segmentation fault
    

The failure is non-deterministic and only occurs under heavy parallel load, typically when many packages with t.Parallel() tests all invoke the terraform binary concurrently. Tests always pass when run in isolation or with a single package.

Reproduction

Setup

Create a Go module with 10 packages, each containing 10 parallel resource.UnitTest tests (100 concurrent terraform processes).

go.mod:

module tf-segfault-repro

go 1.26.3

require (
	github.com/hashicorp/terraform-plugin-framework v1.19.0
	github.com/hashicorp/terraform-plugin-go v0.31.0
	github.com/hashicorp/terraform-plugin-testing v1.16.0
)

Create 10 identical packages (pkg01/ through pkg10/), each with this file as repro_test.go:

package pkg

import (
	"context"
	"fmt"
	"net/http"
	"net/http/httptest"
	"testing"

	"github.com/hashicorp/terraform-plugin-framework/datasource"
	"github.com/hashicorp/terraform-plugin-framework/datasource/schema"
	"github.com/hashicorp/terraform-plugin-framework/provider"
	frameworkschema "github.com/hashicorp/terraform-plugin-framework/provider/schema"
	"github.com/hashicorp/terraform-plugin-framework/providerserver"
	frameworkresource "github.com/hashicorp/terraform-plugin-framework/resource"
	"github.com/hashicorp/terraform-plugin-go/tfprotov6"
	"github.com/hashicorp/terraform-plugin-testing/helper/resource"
)

type stubProvider struct{}

func (p *stubProvider) Metadata(_ context.Context, _ provider.MetadataRequest, resp *provider.MetadataResponse) {
	resp.TypeName = "stub"
}
func (p *stubProvider) Schema(_ context.Context, _ provider.SchemaRequest, resp *provider.SchemaResponse) {
	resp.Schema = frameworkschema.Schema{
		Attributes: map[string]frameworkschema.Attribute{
			"endpoint": frameworkschema.StringAttribute{Optional: true},
		},
	}
}
func (p *stubProvider) Configure(_ context.Context, _ provider.ConfigureRequest, _ *provider.ConfigureResponse) {}
func (p *stubProvider) Resources(_ context.Context) []func() frameworkresource.Resource { return nil }
func (p *stubProvider) DataSources(_ context.Context) []func() datasource.DataSource {
	return []func() datasource.DataSource{func() datasource.DataSource { return &stubDS{} }}
}

type stubDS struct{}

func (d *stubDS) Metadata(_ context.Context, _ datasource.MetadataRequest, resp *datasource.MetadataResponse) {
	resp.TypeName = "stub_thing"
}
func (d *stubDS) Schema(_ context.Context, _ datasource.SchemaRequest, resp *datasource.SchemaResponse) {
	resp.Schema = schema.Schema{
		Attributes: map[string]schema.Attribute{
			"id": schema.StringAttribute{Computed: true},
		},
	}
}
func (d *stubDS) Read(_ context.Context, _ datasource.ReadRequest, _ *datasource.ReadResponse) {}

func protoV6Factories() map[string]func() (tfprotov6.ProviderServer, error) {
	return map[string]func() (tfprotov6.ProviderServer, error){
		"stub": providerserver.NewProtocol6WithError(&stubProvider{}),
	}
}

func makeTest(i int) func(t *testing.T) {
	return func(t *testing.T) {
		t.Parallel()
		srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
			w.Header().Set("Content-Type", "application/json")
			fmt.Fprintf(w, `{"id": "item-%d"}`, i)
		}))
		defer srv.Close()
		resource.UnitTest(t, resource.TestCase{
			ProtoV6ProviderFactories: protoV6Factories(),
			Steps: []resource.TestStep{
				{
					Config: fmt.Sprintf(`
						provider "stub" { endpoint = %q }
						data "stub_thing" "test" {}
					`, srv.URL),
				},
			},
		})
	}
}

func TestP00(t *testing.T) { makeTest(0)(t) }
func TestP01(t *testing.T) { makeTest(1)(t) }
func TestP02(t *testing.T) { makeTest(2)(t) }
func TestP03(t *testing.T) { makeTest(3)(t) }
func TestP04(t *testing.T) { makeTest(4)(t) }
func TestP05(t *testing.T) { makeTest(5)(t) }
func TestP06(t *testing.T) { makeTest(6)(t) }
func TestP07(t *testing.T) { makeTest(7)(t) }
func TestP08(t *testing.T) { makeTest(8)(t) }
func TestP09(t *testing.T) { makeTest(9)(t) }

Create the packages

for pkg in $(seq -w 1 10); do
  mkdir -p "pkg${pkg}"
  cp repro_test.go "pkg${pkg}/repro_test.go"
done
go mod tidy

Run the reproducer

# Run multiple times; crash is intermittent (~1 in 10 runs)
for i in $(seq 1 20); do
  go test -count=1 -race ./... 2>&1 | tee /tmp/run-$i.log
  if grep -q 'segmentation' /tmp/run-$i.log; then
    echo "CRASH on run $i"
    break
  fi
done

Observed output (crash)

--- FAIL: TestP01 (2.60s)
    repro_test.go:67: Step 1/1 error: Error reading saved post-apply non-refresh plan: signal: segmentation fault
FAIL

Or from a real provider test suite (900+ tests across 25 packages):

--- FAIL: TestScheduledTaskResource_ImportService (0.08s)
    resource_test.go:388: failed to create new working directory: unable to disable terraform-exec provider verification: signal: segmentation fault

Expected output

All tests pass (they do pass when run individually or with -p 1).

Impact

This affects any Terraform provider with a large test suite run via go test -race ./.... The race detector increases memory pressure and timing variability, making the crash more likely.

Workarounds:

  • Run with -p 1 (sequential package execution), but this significantly slows large test suites
  • Retry failed test runs (the crash is intermittent)

The root cause appears to be in the terraform binary itself (SIGSEGV), not in terraform-exec or terraform-plugin-testing. However, since the testing framework is responsible for launching many concurrent terraform processes, it may be able to mitigate this (e.g., limiting concurrent terraform binary initializations).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions