--dist=load runs at most half your tests at once #12

sah · 2015-11-14T00:28:35Z

Because of this line, which distributes a minimum of two tests to each node: https://github.com/pytest-dev/pytest-xdist/blob/master/xdist/dsession.py#L369

This came as a surprise to me, because this fact is undocumented, and py.test output doesn't make clear that this is happening. For users running slower tests, such as Selenium functional tests, full parallelization is often desirable. Could the minimum be changed to 1, or made configurable?

RonnyPfannschmidt · 2015-11-14T07:04:46Z

A minimum of 1 pending item per node is needed due to the way setupstate scope caching works

Currently its not clear how to properly change that in a backward compatible way

sah · 2015-11-15T21:57:52Z

Can you point me to the caching system you're referring to and/or what goes wrong with this minimum at 1 instead of 2? I may be interested to devote some time to trying to find a solution.

nicoddemus · 2015-11-16T00:06:54Z

Actually I think sending a single test item works without a problem, IIRC I have tested this in the past because I faced the same use case, where we had a few slow tests.

What @RonnyPfannschmidt is mentioning is actually on the slave side, where each slave will always have at least one test "on queue", until it receives more tests from the master node or a "shutdown" signal from master. This happens because the slave must comply with the pytest_runtest_teardown(item, nextitem) hook: before calling it it must know if there is a nextitem or if the last test on queue is the last test in the session, in which case it should use nextitem=None. Until the slave receives more tests or a "shutdown" signal, it has no way of knowing which is which, that's why it will keep at least one test waiting (here's the relevant code).

sah · 2015-11-16T21:50:57Z

I see, this makes sense.

In my own tests changing the minimum to 1 (in the line I quoted in my original comment) works fine and gets all tests running simultaneously. I was unsure whether there might be some edge-case feature I'm not using that would be broken by this.

nicoddemus · 2015-11-16T22:25:19Z

I think changing the minimum chunk size to 1 seems pretty harmless. If a user has only 4 slow integration tests, it makes perfectly sense to use -n 4 and obtain parallel execution in multi-core CPUs. As it stands, 2 tests will be sent to slaves 1 and 2 each, while slaves 3 and 4 will have no tests to run and will shutdown immediately.

@hpk42 @RonnyPfannschmidt @flub, you remember why that minimum is set to 2?

sah · 2015-11-16T22:48:34Z

I see the problem: if I change this minimum to 1 and run with fewer processes than tests, py.test hangs and no tests are ever run. Looking into why.

sah · 2015-11-17T02:38:54Z

Figured this out. It does interact with the slave code @nicoddemus linked, because if we choose a node_chunksize of 1, we have to distribute all tests or Dsession won't call triggershutdown(), and the shutdown signal that allows the slaves to run tests won't be sent. So, e.g., if we have 10 nodes and 11 tests, we'll pick node_chunksize==1 but distribute only 10 tests and not shut down, and the slaves will each sit with 1 test not running.

So, a better solution than the current minimum of 2 would be to explicitly handle the case where the number of tests <2x the number of nodes and always distribute all tests in that case. Here's an example of how, which works in my tests:

        # how many items per node do we have about?
        items_per_node = len(self.collection) // len(self.node2pending)
        if items_per_node < 2:
            # distribute all tests, round-robin
            while self.pending:
                for node in self.nodes:
                    self._send_tests(node, 1)
            # and tell nodes to shut down when they're done
            for node in self.nodes:
                node.shutdown()
        else:
            # take a fraction of tests for initial distribution
            node_chunksize = max(items_per_node // 4, 2)
            # and initialize each node with a chunk of tests
            for node in self.nodes:
                self._send_tests(node, node_chunksize)

nicoddemus · 2015-11-18T02:23:33Z

@sah thanks for looking into this.

I think you can simplify it though by checking if there are no pending tests and just send a "shutdown" signal to the nodes right after the initial test distribution. For example:

        # how many items per node do we have about?
        items_per_node = len(self.collection) // len(self.node2pending)
        # take a fraction of tests for initial distribution
-       node_chunksize = max(items_per_node // 4, 2)
+       node_chunksize = max(items_per_node // 4, 1)
        # and initialize each node with a chunk of tests
        for node in self.nodes:
            self._send_tests(node, node_chunksize)
+       # initial distribution sent all tests, start node shutdown
+       if not self.pending:
+           for node in self.nodes:
+               node.shutdown()

I haven't had time to test this though, it might have some side effects with other parts of the code. I will try to take a deeper look at this tomorrow, but feel free to test this in the meantime if possible.

sah · 2015-11-18T09:51:45Z

I tried something like this. The problem is that because the code picks a quarter of tests for initial distribution and rounds down, it can end up distributing 1 test to every node, but with some tests left over in pending. Then shutdown is never sent and tests never run.

Here's another shot at simplifying (works in my tests):

        initial_batch = len(self.pending) // 4
        if initial_batch < 2 * len(self.nodes):
            # if we don't have at least two tests per node, we have to
            # send them all so that we can send shutdown signals and
            # get all nodes working
            initial_batch = len(self.pending)

        # distribute tests round-robin up to the batch size (or until we run out)
        nodes = itertools.cycle(self.nodes)
        for i in xrange(initial_batch):
            self._send_tests(nodes.next(), 1)

        if not self.pending:
            # initial distribution sent all tests, start node shutdown
            for node in self.nodes:
                node.shutdown()

nicoddemus · 2015-11-18T10:48:08Z

Oh I see, thanks. 😄

Would you mind opening a PR with that? If possible with tests too.

sah · 2015-11-18T21:17:00Z

Tests kicked up a couple cases I needed to handle, so the final change is a little different from the above. I think the test coverage of the behavior is pretty good now.

sah mentioned this issue Nov 18, 2015

Do a better job parallelizing the inital batch of tests #13

Merged

RonnyPfannschmidt closed this as completed in #13 Nov 19, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--dist=load runs at most half your tests at once #12

--dist=load runs at most half your tests at once #12

sah commented Nov 14, 2015

RonnyPfannschmidt commented Nov 14, 2015

sah commented Nov 15, 2015

nicoddemus commented Nov 16, 2015

sah commented Nov 16, 2015

nicoddemus commented Nov 16, 2015

sah commented Nov 16, 2015

sah commented Nov 17, 2015

nicoddemus commented Nov 18, 2015

sah commented Nov 18, 2015

nicoddemus commented Nov 18, 2015

sah commented Nov 18, 2015

--dist=load runs at most half your tests at once #12

--dist=load runs at most half your tests at once #12

Comments

sah commented Nov 14, 2015

RonnyPfannschmidt commented Nov 14, 2015

sah commented Nov 15, 2015

nicoddemus commented Nov 16, 2015

sah commented Nov 16, 2015

nicoddemus commented Nov 16, 2015

sah commented Nov 16, 2015

sah commented Nov 17, 2015

nicoddemus commented Nov 18, 2015

sah commented Nov 18, 2015

nicoddemus commented Nov 18, 2015

sah commented Nov 18, 2015