Don't use numpy broadcasting in guvectorize inner loop #348

tomwhite · 2020-10-23T11:43:19Z

Small but critical change to count_cohort_alleles that takes the runtime down from ~100 min (projected - not run to completion) on MalariaGEN data to ~8 min. Having explicit loops rather than relying on broadcasting in the inner loop seems to be the key here.

eric-czech

👍 good to know.

hammer · 2020-10-23T12:31:18Z

🤔 that's a big enough jump for a basic enough pattern that it may be worth adding to our documentation some place?

jeromekelleher · 2020-10-23T13:26:46Z

Wow, yeah, that's really good to know.

eric-czech · 2020-10-23T13:37:48Z

How did you find that btw @tomwhite? Did something tip you off to that specific code being a performance problem?

tomwhite · 2020-10-26T09:43:05Z

that's a big enough jump for a basic enough pattern that it may be worth adding to our documentation some place?

Added to the contributors guide.

Did something tip you off to that specific code being a performance problem?

It wasn't obvious to me why that code should be an order of magnitude slower than count_alleles, so I compared the two to see if there was anything obviously different. I noticed that _count_cohort_alleles had a single loop, rather than a double loop as might be expected. I found this post, which has some good advice: "Instead of array operations, we are very explicit within the function and do everything with loops." So I tried that, and saw the speed increase.

Note that this doesn't mean we shouldn't ever use array operations within guvectorize - indeed, these functions fill the output array at the beginning with e.g. out[:, :] = 0 - but that array operations are generally not a good idea buried in the inner loop.

timothymillar · 2020-10-26T22:59:12Z

@tomwhite possibly related, I've always found a significant speed up when using numba for integer arithmetic by iterating over a range rather than the elements of an array.
For example:

@njit
def sum1(array):
    total = 0
    for i in array:
        total += i
    return total

is approximately 4-5x slower for an integer array than the following

@njit
def sum2(array):
    total = 0
    for i in range(len(array)):
        total += array[i]
    return total

My assumption here was that sum1 has more runtime overhead because i could be an integer or an ndarray where as in sum2 i is always an integer.
However the speedup seems to disappear for float addition so I'm no sure:

(numba 0.51.2)

>>> ints = np.random.randint(0,100,10_000)
>>> floats = ints.astype(np.float)
>>>
>>> %%timeit
... sum1(ints) 
4.95 µs ± 90.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %%timeit
... sum2(ints) 
1.11 µs ± 4.08 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> %%timeit
... sum1(floats) 
11.5 µs ± 8.07 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %%timeit
... sum2(floats) 
11.6 µs ± 15.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Don't use numpy broadcasting in guvectorize inner loop

e952399

eric-czech approved these changes Oct 23, 2020

View reviewed changes

jeromekelleher approved these changes Oct 23, 2020

View reviewed changes

Add links to design discussions about guvectorize

86a4544

tomwhite added the auto-merge Auto merge label for mergify test flight label Oct 26, 2020

Merge branch 'master' into fix-slow-count-cohort-alleles

f10eeba

mergify bot merged commit 9427171 into sgkit-dev:master Oct 26, 2020

tomwhite mentioned this pull request Oct 26, 2020

Track and improve the performance of allele counting method #49

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Don't use numpy broadcasting in guvectorize inner loop #348

Don't use numpy broadcasting in guvectorize inner loop #348

Uh oh!

tomwhite commented Oct 23, 2020

Uh oh!

eric-czech left a comment

Uh oh!

hammer commented Oct 23, 2020

Uh oh!

jeromekelleher commented Oct 23, 2020

Uh oh!

eric-czech commented Oct 23, 2020

Uh oh!

tomwhite commented Oct 26, 2020

Uh oh!

timothymillar commented Oct 26, 2020

Uh oh!

Uh oh!

Don't use numpy broadcasting in guvectorize inner loop #348

Don't use numpy broadcasting in guvectorize inner loop #348

Uh oh!

Conversation

tomwhite commented Oct 23, 2020

Uh oh!

eric-czech left a comment

Choose a reason for hiding this comment

Uh oh!

hammer commented Oct 23, 2020

Uh oh!

jeromekelleher commented Oct 23, 2020

Uh oh!

eric-czech commented Oct 23, 2020

Uh oh!

tomwhite commented Oct 26, 2020

Uh oh!

timothymillar commented Oct 26, 2020

Uh oh!

Uh oh!