Skip to content

runtime: fatal error: acquirep: invalid p state (AMD Opteron 6172 with 48 cores) #10240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
artjomsimon opened this issue Mar 24, 2015 · 6 comments
Milestone

Comments

@artjomsimon
Copy link

Hi everyone,

while experimenting with the language and trying to port a LU factorization benchmark from the Barcelona OpenMP Tasks Suite written in C, I got fatal error: acquirep: invalid p state on an AMD 48-core machine (Opteron 6172).

I've tried to strip the code to the bare minimum that triggers the error. That's a bit difficult, because the problem is highly undeterministic, and removing code that I presume to be irrelevant for a race condition just never triggers it. This version seems to trigger it quite reliably, in 50%-80% of the runs.
I've compiled it with go build (go 1.4.2) and ran the executable in a for loop in bash:

package main

import (
        "flag"
        "fmt"
        "runtime"
        "sync"
)

var matrixSize, submatrixSize int

/***********************************************************************
 * genmat:
 **********************************************************************/
func genmat(M []*[][]float32) {
        var null_entry bool

        for ii := 0; ii < matrixSize; ii++ {
                for jj := 0; jj < matrixSize; jj++ {
                        null_entry = false

                        if (ii < jj) && (ii%3 != 0) {
                                null_entry = true
                        }
                        if (ii > jj) && (jj%3 != 0) {
                                null_entry = true
                        }
                        if ii%2 == 1 {
                                null_entry = true
                        }
                        if jj%2 == 1 {
                                null_entry = true
                        }
                        if ii == jj {
                                null_entry = false
                        }
                        if ii == jj-1 {
                                null_entry = false
                        }
                        if ii-1 == jj {
                                null_entry = false
                        }
                        if null_entry == false {

                                subMatrix := make([][]float32, submatrixSize)
                                for i := range subMatrix {
                                        subMatrix[i] = make([]float32, submatrixSize)
                                }

                                M[ii*matrixSize+jj] = &subMatrix

                        } else {
                                M[ii*matrixSize+jj] = nil
                        }
                }
        }
}

func sparselu_init(pBENCH *[]*[][]float32, pass string) {
        *pBENCH = make([]*[][]float32, matrixSize*matrixSize)
        genmat(*pBENCH)
}

func sparselu_par_call(BENCH []*[][]float32) {

        var wg sync.WaitGroup

        for kk := 0; kk < matrixSize; kk++ {
                for ii := kk + 1; ii < matrixSize; ii++ {
                        if BENCH[ii*matrixSize+kk] != nil {
                                for jj := kk + 1; jj < matrixSize; jj++ {
                                        if BENCH[kk*matrixSize+jj] != nil {
                                                //#pragma omp task untied firstprivate(kk, jj, ii) shared(BENCH)
                                                wg.Add(1)
                                                jj := jj

                                                go func(wg *sync.WaitGroup) {
                                                        defer (*wg).Done()
                                                        if BENCH[ii*matrixSize+jj] == nil {
                                                                subMatrix := make([][]float32, submatrixSize)
                                                                // go-style initializing 2d matrix in a loop
                                                                for i := range subMatrix {
                                                                        subMatrix[i] = make([]float32, submatrixSize)
                                                                }
                                                                BENCH[ii*matrixSize+jj] = &subMatrix
                                                        }
                                                        for i := 0; i < submatrixSize; i++ {
                                                                for j := 0; j < submatrixSize; j++ {
                                                                        for k := 0; k < submatrixSize; k++ {
                                                                        }
                                                                }
                                                        }
                                                }(&wg)
                                        }
                                }
                                wg.Wait()
                        }
                }
        }
}

func main() {

        runtime.GOMAXPROCS(47)

        flag.IntVar(&matrixSize, "n", 50, "Matrix size")
        flag.IntVar(&submatrixSize, "m", 100, "Submatrix size")
        flag.Parse()

        var matrixPar []*[][]float32
        sparselu_init(&matrixPar, "Parallel")
        sparselu_par_call(matrixPar)
        fmt.Println("Program ended")
}

Now I'm aware that this isn't idiomatic go, at all, it's a quite literal transliteration of the C version, but nevertheless, I guess the go runtime shouldn't crash like this:

for i in {1..15}; do bin/sparselu-crash-mwe -n 201 -m 69; done

acquirep: p->m=0xc20db16380(38) p->status=1
fatal error: acquirep: invalid p state

runtime stack:
runtime.throw(0x562667)
        /usr/local/go/src/runtime/panic.go:491 +0xad
acquirep(0xc208062400)
        /usr/local/go/src/runtime/proc.c:2747 +0x10d
stopm()
        /usr/local/go/src/runtime/proc.c:1186 +0x1b2
findrunnable(0xc208062400)
        /usr/local/go/src/runtime/proc.c:1487 +0x562
schedule()
        /usr/local/go/src/runtime/proc.c:1575 +0x151
goexit0(0xc2119ab680)
        /usr/local/go/src/runtime/proc.c:1717 +0x16e
runtime.mcall(0x42e3c4)
        /usr/local/go/src/runtime/asm_amd64.s:186 +0x5a

goroutine 1 [semacquire]:
sync.(*WaitGroup).Wait(0xc20d1e38a0)
        /usr/local/go/src/sync/waitgroup.go:132 +0x169
main.sparselu_par_call(0xc2080ae000, 0x9dd1, 0x9dd1)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:96 +0x274
main.main()
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:112 +0x11f

goroutine 531203 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531207 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531185 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531210 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531224 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531186 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531195 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531213 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531219 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531191 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531187 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531194 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531220 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531225 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531184 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531176 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531200 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531208 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531182 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531214 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531189 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531221 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531174 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531211 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531205 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531222 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531178 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 530478 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531212 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531202 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531192 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531172 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 530480 [runnable]:
sync.(*WaitGroup).Done(0xc20d1e38a0)
        /usr/local/go/src/sync/waitgroup.go:84
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x2e5
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531169 [runnable]:
sync.(*WaitGroup).Done(0xc20d1e38a0)
        /usr/local/go/src/sync/waitgroup.go:84
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x2e5
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531188 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531171 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531206 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531216 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 530479 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531175 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531179 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 530477 [runnable]:
sync.(*WaitGroup).Done(0xc20d1e38a0)
        /usr/local/go/src/sync/waitgroup.go:84
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x2e5
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531197 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531223 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531183 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531181 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531209 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531217 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531180 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531190 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 530476 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531218 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531170 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531173 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531215 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531193 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531196 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531201 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531199 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531204 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531177 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

goroutine 531198 [runnable]:
main.func·001(0xc20d1e38a0)
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:77
created by main.sparselu_par_call
        /home/stsimon/golang-parallel/src/benchmarks/sparselu-crash-mwe.go:93 +0x247

Inlining or reducing the genmat() function to trivial cases seems to stop provoking the crash.
I can't reproduce this on an Intel 2-core CPU (i3/i5).

Am I doing something completely wrong here, or is this a legitimate runtime bug? How can I debug this further?

Thank you,
Artjom

@mikioh mikioh changed the title fatal error: acquirep: invalid p state (AMD Opteron 6172 with 48 cores) runtime: fatal error: acquirep: invalid p state (AMD Opteron 6172 with 48 cores) Mar 24, 2015
@ianlancetaylor ianlancetaylor added this to the Go1.5 milestone Mar 26, 2015
@dvyukov
Copy link
Member

dvyukov commented Mar 27, 2015

Please run the program with GOTRACEBACK=2 GODEBUG=scheddetail=1 environment variables and attach output. Or better several outputs.

@artjomsimon
Copy link
Author

Hi Dmitry,

thanks for having a look at this.

I re-ran the code above again and couldn't reproduce it; I probably posted the wrong version, but here's the one I can reproduce it with in 50% of the runs:
https://raw.githubusercontent.com/artjomsimon/go-bots/5531c21c36f636f6f0a290abf62a25da9be6af32/sparselu-crash-mwe.go

I re-ran this 20 times after exporting the environment variables you've specified.

> export GOTRACEBACK=2
> export GODEBUG=scheddetail=1
> for i in {1..20}; do go run sparselu-crash-mwe.go -n 201 -m 69 &> bugs/invalid-p-state/invalid-p-state-$i.txt; done

Out of 20 runs: 3 ran ok, 7 segfaulted, and 10 crashed, generating the traces here:

https://raw.githubusercontent.com/artjomsimon/go-bots/5531c21c36f636f6f0a290abf62a25da9be6af32/bugs/invalid-p-state/invalid-p-state-2.txt
https://raw.githubusercontent.com/artjomsimon/go-bots/5531c21c36f636f6f0a290abf62a25da9be6af32/bugs/invalid-p-state/invalid-p-state-4.txt
https://raw.githubusercontent.com/artjomsimon/go-bots/5531c21c36f636f6f0a290abf62a25da9be6af32/bugs/invalid-p-state/invalid-p-state-7.txt
https://raw.githubusercontent.com/artjomsimon/go-bots/5531c21c36f636f6f0a290abf62a25da9be6af32/bugs/invalid-p-state/invalid-p-state-8.txt
https://raw.githubusercontent.com/artjomsimon/go-bots/5531c21c36f636f6f0a290abf62a25da9be6af32/bugs/invalid-p-state/invalid-p-state-9.txt
https://raw.githubusercontent.com/artjomsimon/go-bots/5531c21c36f636f6f0a290abf62a25da9be6af32/bugs/invalid-p-state/invalid-p-state-11.txt
https://raw.githubusercontent.com/artjomsimon/go-bots/5531c21c36f636f6f0a290abf62a25da9be6af32/bugs/invalid-p-state/invalid-p-state-15.txt
https://raw.githubusercontent.com/artjomsimon/go-bots/5531c21c36f636f6f0a290abf62a25da9be6af32/bugs/invalid-p-state/invalid-p-state-16.txt
https://raw.githubusercontent.com/artjomsimon/go-bots/5531c21c36f636f6f0a290abf62a25da9be6af32/bugs/invalid-p-state/invalid-p-state-18.txt
https://raw.githubusercontent.com/artjomsimon/go-bots/5531c21c36f636f6f0a290abf62a25da9be6af32/bugs/invalid-p-state/invalid-p-state-20.txt

I have another version of the benchmark, using channels instead of sync/WaitGroup for synchronization. This version crashes too, but with the error message fatal error: bad g->status in ready, also in 50% of the cases.

This other version consists of the following two files:
https://raw.githubusercontent.com/artjomsimon/go-bots/8e237669f6f5285f8a703717a09e8500acf671cc/sparselu-crash-mwe-chan.go
https://raw.githubusercontent.com/artjomsimon/go-bots/b4c00a28d590bb1668a1ad523335a2bad3043827/taskpool-simple-queue-chan.go

I ran them like this, again:

> export GOTRACEBACK=2
> export GODEBUG=scheddetail=1
> for i in {1..20}; do
>   go run sparselu-crash-mwe-chan.go taskpool-simple-queue-chan.go -n 201 -m 69 &> bugs/bad-g-status-in-ready/bug-go-$i.txt;
> done

Out of 20 runs: 4 ran ok, 6 segfaulted, and 10 crashed, generating varying errors.

fatal error: bad g->status in ready:
https://github.com/artjomsimon/go-bots/blob/master/bugs/bad-g-status-in-ready/bug-go-3.txt
https://github.com/artjomsimon/go-bots/blob/master/bugs/bad-g-status-in-ready/bug-go-5.txt
https://github.com/artjomsimon/go-bots/blob/master/bugs/bad-g-status-in-ready/bug-go-10.txt
https://github.com/artjomsimon/go-bots/blob/master/bugs/bad-g-status-in-ready/bug-go-15.txt
https://github.com/artjomsimon/go-bots/blob/master/bugs/bad-g-status-in-ready/bug-go-18.txt

fatal error: acquirep: invalid p state:
https://github.com/artjomsimon/go-bots/blob/master/bugs/bad-g-status-in-ready/bug-go-4.txt
https://github.com/artjomsimon/go-bots/blob/master/bugs/bad-g-status-in-ready/bug-go-6.txt
https://github.com/artjomsimon/go-bots/blob/master/bugs/bad-g-status-in-ready/bug-go-7.txt
https://github.com/artjomsimon/go-bots/blob/master/bugs/bad-g-status-in-ready/bug-go-17.txt

fatal error: fault
https://github.com/artjomsimon/go-bots/blob/master/bugs/bad-g-status-in-ready/bug-go-14.txt


Here's more information about the CPU and memory on the machine:

stsimon@leo:~/golang-parallel/src/benchmarks> lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                48
On-line CPU(s) list:   0-47
Thread(s) per core:    1
Core(s) per socket:    12
Socket(s):             4
NUMA node(s):          8
Vendor ID:             AuthenticAMD
CPU family:            16
Model:                 9
Model name:            AMD Opteron(tm) Processor 6172
Stepping:              1
CPU MHz:               800.000
BogoMIPS:              4199.97
Virtualization:        AMD-V
L1d cache:             64K
L1i cache:             64K
L2 cache:              512K
L3 cache:              5118K
NUMA node0 CPU(s):     0-5
NUMA node1 CPU(s):     6-11
NUMA node2 CPU(s):     12-17
NUMA node3 CPU(s):     18-23
NUMA node4 CPU(s):     24-29
NUMA node5 CPU(s):     30-35
NUMA node6 CPU(s):     36-41
NUMA node7 CPU(s):     42-47

> free -m
             total       used       free     shared    buffers     cached
Mem:        128562       6322     122239         46        204       4676
-/+ buffers/cache:       1441     127120
Swap:         8197          0       8197

Please tell me if I can do anything else.

@dvyukov
Copy link
Member

dvyukov commented Mar 28, 2015

Thanks for the detailed info!
Since it crashes with different error messages (including just faults), it looks like a heap corruption rather than issue with invalid p state in scheduler.
One common cause of heap corruptions is data races. However, the race detector don't say anything on the program.
I still did not get to my larger machine to try to reproduce it locally.

@rsc @randall77 @RLH

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/10713 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/10791 mentions this issue.

methane pushed a commit to methane/go that referenced this issue Jun 6, 2015
Issues golang#10240, golang#10541, golang#10941, golang#11023, golang#11027 and possibly others are
indicating memory corruption in the runtime. One of the easiest places
to both get corruption and detect it is in the allocator's free lists
since they appear throughout memory and follow strict invariants. This
commit adds a check when sweeping a span that its free list is sane
and, if not, it prints the corrupted free list and panics. Hopefully
this will help us collect more information on these failures.

Change-Id: I6d417bcaeedf654943a5e068bd76b58bb02d4a64
aclements added a commit that referenced this issue Jun 7, 2015
Stack barriers assume that writes through pointers to frames above the
current frame will get write barriers, and hence these frames do not
need to be re-scanned to pick up these changes. For normal writes,
this is true. However, there are places in the runtime that use
typedmemmove to potentially write through pointers to higher frames
(such as mapassign1). Currently, typedmemmove does not execute write
barriers if the destination is on the stack. If there's a stack
barrier between the current frame and the frame being modified with
typedmemmove, and the stack barrier is not otherwise hit, it's
possible that the garbage collector will never see the updated pointer
and incorrectly reclaim the object.

Fix this by making heapBitsBulkBarrier (which lies behind typedmemmove
and its variants) detect when the destination is in the stack and
unwind stack barriers up to the point, forcing mark termination to
later rescan the effected frame and collect these pointers.

Fixes #11084. Might be related to #10240, #10541, #10941, #11023,
 #11027 and possibly others.

Change-Id: I323d6cd0f1d29fa01f8fc946f4b90e04ef210efd
Reviewed-on: https://go-review.googlesource.com/10791
Reviewed-by: Russ Cox <[email protected]>
aclements added a commit that referenced this issue Jun 16, 2015
Issues #10240, #10541, #10941, #11023, #11027 and possibly others are
indicating memory corruption in the runtime. One of the easiest places
to both get corruption and detect it is in the allocator's free lists
since they appear throughout memory and follow strict invariants. This
commit adds a check when sweeping a span that its free list is sane
and, if not, it prints the corrupted free list and panics. Hopefully
this will help us collect more information on these failures.

Change-Id: I6d417bcaeedf654943a5e068bd76b58bb02d4a64
Reviewed-on: https://go-review.googlesource.com/10713
Reviewed-by: Keith Randall <[email protected]>
Reviewed-by: Russ Cox <[email protected]>
Run-TryBot: Austin Clements <[email protected]>
@aclements
Copy link
Member

Hi @artjomsimon. We've fixed several memory corruption and lost write barrier issues in the runtime over the past few weeks. Please try to reproduce the problem with current master and reopen this issue if it's still happening. Thanks!

@golang golang locked and limited conversation to collaborators Jun 25, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants