-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Travis CI tests failing for 3.4 on master #3543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I also saw a 3.6 failure: https://travis-ci.org/python/mypy/builds/242766320?utm_source=github_status&utm_medium=notification Previously I had similar issues apparently caused by Travis CI killing processes when we had too many of them running in parallel. Restricting the maximum level of parallelism in Travis CI could help. |
I also noticed that the parallelization is usually at 32 workers, but occasionally switches to 2 workers without any obvious reason (for just one or two builds). Perhaps Travis VM reports different number of cores to our test runner? I believe we don't have access to sudo, otherwise, we could run
|
We could try restricting the maximum level of parallelism in Travis CI to, say, 16. |
According to the Travis docs, we get 2 cpu cores per container see here. Im not sure we should go over that, at least not too much. EDIT: I just tested and using 2 cores leads to about a 40% increase in time spent per run. Im pretty sure we don't want that. |
@ethanhs Yeah I noticed the same. Even a reduction from 32 to 16 resulted in a slight increase in runtime. Why would that happen given that we only have 2 cores? I guess our tests have a decent amount of I/O wait (presumably disk?), and we unintentionally use our (very expensive, process-based) workers to deal with blocking I/O. Obviously, the ideal solution would be to just use 2 processes instead of 32, but within them create either threads or (better) an
So that ideal solution is no good. Practically, I think we can just keep the number of workers low enough that the memory problems don't happen, and high enough that blocking on I/O is not a big performance hit. |
Also, I just ran I think investigating if your suspicion is correct would be very useful. One interesting thing I've noticed is that all of the failures are on the longest running container. If you think it is switching to 2 workers randomly, perhaps we can use |
Let's just decrease the maximum parallelism from 32 to 16 and see if that fixes the problem instead of doing anything more involved. It isn't very valuable to understand the root cause if it's specific to Travis CI and we can find a simple workaround. Slower tests are generally preferable to unreliable tests, in my opinion. |
This is an attempt to fix the spurious errors that have happened in Travis (see #3543). Hopefully we won't need to reduce this further to avoid errors.
It doesn't seem like the decrease helped. Tests are still failing in typeshed CI, pretty consistently now. |
I believe the issue there is that here mypy is run with the default concurrent processes, which on a travis worker is 32. If we lower that to 12, it should improve things (but probably won't solve them). |
Thanks for finding that! Would you mind submitting a PR to typeshed to fix it? |
Mypy has issues with running its test suite with many processes concurrently. This should reduce travis test failures, if not completely resolve failures. See issue python/mypy#3543
Are we still seeing flakes on Travis-CI? Jukka and I discussed this off-line, and the best approach we can think of is to increase the test timeout from 30s to 5min. If it then goes away we can assume that was the issue. The timeouts rarely if ever caught anything real (most tests don't even run with a timeout). |
Yes this was hit in #4041 I believe. I'll make a PR to up the timeout to 5 minutes. |
See e.g.
These are commits from PRs that passed all tests. The failure is always only on the Python 3.4 build, in different eval-test-* subtests. The "Actual" test output is always empty, which makes me wonder if the processes just die for some Travis-CI-specific reason. I see nothing at https://www.traviscistatus.com/.
The text was updated successfully, but these errors were encountered: