Skip to content

--dist=load runs at most half your tests at once #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sah opened this issue Nov 14, 2015 · 11 comments · Fixed by #13
Closed

--dist=load runs at most half your tests at once #12

sah opened this issue Nov 14, 2015 · 11 comments · Fixed by #13

Comments

@sah
Copy link
Contributor

sah commented Nov 14, 2015

Because of this line, which distributes a minimum of two tests to each node: https://github.com/pytest-dev/pytest-xdist/blob/master/xdist/dsession.py#L369

This came as a surprise to me, because this fact is undocumented, and py.test output doesn't make clear that this is happening. For users running slower tests, such as Selenium functional tests, full parallelization is often desirable. Could the minimum be changed to 1, or made configurable?

@RonnyPfannschmidt
Copy link
Member

A minimum of 1 pending item per node is needed due to the way setupstate scope caching works

Currently its not clear how to properly change that in a backward compatible way

@sah
Copy link
Contributor Author

sah commented Nov 15, 2015

Can you point me to the caching system you're referring to and/or what goes wrong with this minimum at 1 instead of 2? I may be interested to devote some time to trying to find a solution.

@nicoddemus
Copy link
Member

Actually I think sending a single test item works without a problem, IIRC I have tested this in the past because I faced the same use case, where we had a few slow tests.

What @RonnyPfannschmidt is mentioning is actually on the slave side, where each slave will always have at least one test "on queue", until it receives more tests from the master node or a "shutdown" signal from master. This happens because the slave must comply with the pytest_runtest_teardown(item, nextitem) hook: before calling it it must know if there is a nextitem or if the last test on queue is the last test in the session, in which case it should use nextitem=None. Until the slave receives more tests or a "shutdown" signal, it has no way of knowing which is which, that's why it will keep at least one test waiting (here's the relevant code).

@sah
Copy link
Contributor Author

sah commented Nov 16, 2015

I see, this makes sense.

In my own tests changing the minimum to 1 (in the line I quoted in my original comment) works fine and gets all tests running simultaneously. I was unsure whether there might be some edge-case feature I'm not using that would be broken by this.

@nicoddemus
Copy link
Member

I think changing the minimum chunk size to 1 seems pretty harmless. If a user has only 4 slow integration tests, it makes perfectly sense to use -n 4 and obtain parallel execution in multi-core CPUs. As it stands, 2 tests will be sent to slaves 1 and 2 each, while slaves 3 and 4 will have no tests to run and will shutdown immediately.

@hpk42 @RonnyPfannschmidt @flub, you remember why that minimum is set to 2?

@sah
Copy link
Contributor Author

sah commented Nov 16, 2015

I see the problem: if I change this minimum to 1 and run with fewer processes than tests, py.test hangs and no tests are ever run. Looking into why.

@sah
Copy link
Contributor Author

sah commented Nov 17, 2015

Figured this out. It does interact with the slave code @nicoddemus linked, because if we choose a node_chunksize of 1, we have to distribute all tests or Dsession won't call triggershutdown(), and the shutdown signal that allows the slaves to run tests won't be sent. So, e.g., if we have 10 nodes and 11 tests, we'll pick node_chunksize==1 but distribute only 10 tests and not shut down, and the slaves will each sit with 1 test not running.

So, a better solution than the current minimum of 2 would be to explicitly handle the case where the number of tests <2x the number of nodes and always distribute all tests in that case. Here's an example of how, which works in my tests:

        # how many items per node do we have about?
        items_per_node = len(self.collection) // len(self.node2pending)
        if items_per_node < 2:
            # distribute all tests, round-robin
            while self.pending:
                for node in self.nodes:
                    self._send_tests(node, 1)
            # and tell nodes to shut down when they're done
            for node in self.nodes:
                node.shutdown()
        else:
            # take a fraction of tests for initial distribution
            node_chunksize = max(items_per_node // 4, 2)
            # and initialize each node with a chunk of tests
            for node in self.nodes:
                self._send_tests(node, node_chunksize)

@nicoddemus
Copy link
Member

@sah thanks for looking into this.

I think you can simplify it though by checking if there are no pending tests and just send a "shutdown" signal to the nodes right after the initial test distribution. For example:

        # how many items per node do we have about?
        items_per_node = len(self.collection) // len(self.node2pending)
        # take a fraction of tests for initial distribution
-       node_chunksize = max(items_per_node // 4, 2)
+       node_chunksize = max(items_per_node // 4, 1)
        # and initialize each node with a chunk of tests
        for node in self.nodes:
            self._send_tests(node, node_chunksize)
+       # initial distribution sent all tests, start node shutdown
+       if not self.pending:
+           for node in self.nodes:
+               node.shutdown()

I haven't had time to test this though, it might have some side effects with other parts of the code. I will try to take a deeper look at this tomorrow, but feel free to test this in the meantime if possible.

@sah
Copy link
Contributor Author

sah commented Nov 18, 2015

I tried something like this. The problem is that because the code picks a quarter of tests for initial distribution and rounds down, it can end up distributing 1 test to every node, but with some tests left over in pending. Then shutdown is never sent and tests never run.

Here's another shot at simplifying (works in my tests):

        initial_batch = len(self.pending) // 4
        if initial_batch < 2 * len(self.nodes):
            # if we don't have at least two tests per node, we have to
            # send them all so that we can send shutdown signals and
            # get all nodes working
            initial_batch = len(self.pending)

        # distribute tests round-robin up to the batch size (or until we run out)
        nodes = itertools.cycle(self.nodes)
        for i in xrange(initial_batch):
            self._send_tests(nodes.next(), 1)

        if not self.pending:
            # initial distribution sent all tests, start node shutdown
            for node in self.nodes:
                node.shutdown()

@nicoddemus
Copy link
Member

Oh I see, thanks. 😄

Would you mind opening a PR with that? If possible with tests too.

@sah
Copy link
Contributor Author

sah commented Nov 18, 2015

Tests kicked up a couple cases I needed to handle, so the final change is a little different from the above. I think the test coverage of the behavior is pretty good now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants