Skip to content

CUDA test regression scheduling.parallelizeAtomicReduction (and one other) #454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Infinoid opened this issue May 6, 2021 · 2 comments
Closed
Assignees
Labels
bug Indicates an unexpected problem or unintended behavior

Comments

@Infinoid
Copy link
Contributor

Infinoid commented May 6, 2021

the test applies this schedule:

    stmt = stmt.split(i, i0, i1, 2)
            .split(i0, block, thread, 2)
            .parallelize(block, ParallelUnit::GPUBlock, OutputRaceStrategy::Temporary)
            .parallelize(thread, ParallelUnit::GPUThread, OutputRaceStrategy::Temporary);

Apparently the second parallelize call is ignored. Later on, the cuda codegen fails because it sees block parallism but no thread parallelism.

git bisect says this was caused by c004447. This commit is in the middle of the assembly-v2 effort, and some debug messages are present.

In c004447:

$ bin/taco-test --gtest_filter=scheduling.parallelizeAtomicReduction
[…]
temps:tthreadC : double
results:C : double
checking: where(C += tthreadC, forall(thread, forall(i1, tthreadC += A(i) * B(i))))
checking: where(C += tthreadC, forall(thread, forall(i1, tthreadC += A(i) * B(i))))
temps:tthreadC : double
results:C : double
checking: where(C += tthreadC, forall(thread, forall(i1, tthreadC += A(i) * B(i))))
checking: where(C += tthreadC, forall(thread, forall(i1, tthreadC += A(i) * B(i))))
Compiler bug at /home/infinoid/workspace/taco/git/src/codegen/codegen_cuda.cpp:373 in visit
Please report it to developers
 Condition failed: blockIDVars.size() == threadIDVars.size()
 No matching GPUThread parallelize for GPUBlock
[  FAILED  ] scheduling.parallelizeAtomicReduction (3736 ms)

In c004447^ (the commit preceding it):

$ bin/taco-test --gtest_filter=scheduling.parallelizeAtomicReduction
[…]
temps:tthreadC : double, ti1tthreadC : double
results:C : double
checking: where(C += tthreadC, forall(thread, where(tthreadC += ti1tthreadC, forall(i1, ti1tthreadC += A(i) * B(i))), GPUThread, Atomics))
checking: where(C += tthreadC, forall(thread, where(tthreadC += ti1tthreadC, forall(i1, ti1tthreadC += A(i) * B(i))), GPUThread, Atomics))
checking: where(tthreadC += ti1tthreadC, forall(i1, ti1tthreadC += A(i) * B(i)))
checking: where(tthreadC += ti1tthreadC, forall(i1, ti1tthreadC += A(i) * B(i)))
temps:tthreadC : double, ti1tthreadC : double
results:C : double
checking: where(C += tthreadC, forall(thread, where(tthreadC += ti1tthreadC, forall(i1, ti1tthreadC += A(i) * B(i))), GPUThread, Atomics))
checking: where(C += tthreadC, forall(thread, where(tthreadC += ti1tthreadC, forall(i1, ti1tthreadC += A(i) * B(i))), GPUThread, Atomics))
checking: where(tthreadC += ti1tthreadC, forall(i1, ti1tthreadC += A(i) * B(i)))
checking: where(tthreadC += ti1tthreadC, forall(i1, ti1tthreadC += A(i) * B(i)))
temps:
results:expected : double
temps:
results:expected : double
temps:
results:A1337 : double
temps:
results:
[       OK ] scheduling.parallelizeAtomicReduction (9290 ms)

A second test, scheduling.parallelizeTemporaryReduction fails in the same way, though the debug output is a bit bigger. It also passes in the preceding commit.

I am building on linux with -DCMAKE_BUILD_TYPE=Debug -DCUDA=ON -DOPENMP=ON.

@stephenchouca stephenchouca self-assigned this May 6, 2021
@stephenchouca stephenchouca added the bug Indicates an unexpected problem or unintended behavior label May 6, 2021
@stephenchouca
Copy link
Contributor

stephenchouca commented May 7, 2021

I believe this should be fixed on master now. @Infinoid It'd be great if you can check and see if that's the case.

@Infinoid
Copy link
Contributor Author

Infinoid commented May 7, 2021

e94e68e fixed these errors, but introduced some new failures. I will log a separate issue for those.

We really need CI testing for CUDA...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants