Description
We keep seeing test flakes in Travis, attributed to #3543. They are kind of hard to recognize, since the failures just have the form "Expected: ; Actual: (empty)".
We suspect that the subprocesses created by the integration tests (the python*eval tests) are running out of memory because there are too many of them, and are being killed, but whatever is managing the subprocesses doesn't notice the exist status. IIRC Linux OOM kills appear as killed by signal 9, i.e. SIGKILL, but if you just read the pipe you'll never know.
We need some investigation into this theory and improved status checking, so at least the crashes are easier to diagnose. @ethanhs @elazarg are either of you interested?
My theory could be patently false, e.g. if some of the failures seen don't come from "python*eval" tests. It could also be that somehow runtests.py -j12
spins up pytest -n12
plus 11 other tasks, which would mean we're putting double the load on Travis that we intended.