Skip to content

Exception throwed in "precompute" schedule #347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
roastduck opened this issue Dec 12, 2020 · 5 comments
Open

Exception throwed in "precompute" schedule #347

roastduck opened this issue Dec 12, 2020 · 5 comments
Assignees
Labels
bug Indicates an unexpected problem or unintended behavior

Comments

@roastduck
Copy link
Contributor

I was trying to run the following schedule:

taco "C(i0,i1,j)=A(i0,i1,p0,p1)*B(p0,p1,j)" -f="C:ddd:0,1,2" -f="A:dsdd:0,2,1,3" -f="B:ddd:0,1,2" -s="precompute(A(i0,i1,p0,p1)*B(p0,p1,j),p0,p0_cache)"

It ended up with an error:

terminate called after throwing an instance of 'taco::TacoException'
  what():  Compiler bug at /home/rd/src/taco/src/lower/lower.cpp:49 in lower
Please report it to developers
 Condition failed: isLowerable(stmt, &reason)
 Not lowerable, because the index statement is not in concrete index notation, because all variables in concrete notation must be bound by a forall statement: suchthat(forall(i0, where(forall(p0, forall(i1, forall(p1, forall(j, C(i0,i1,j) += workspace(p0))))), forall(p0_cache, workspace(p0_cache) = A(i0,i1,p0_cache,p1) * B(p0_cache,p1,j)))), precompute(p0, p0_cache))

I was running the latest commit (dd62216). However, the online tool at http://tensor-compiler.org/codegen.html works fine with the same command. Maybe one of the latest commit introduced the bug.

@roastduck
Copy link
Contributor Author

PS. I think it would be better to display the actual deployed commit ID on the online tool site, so I can fall back to that version if the master branch goes wrong.

@stephenchouca
Copy link
Contributor

It seems like there's a bug with precompute. Actually, the code generated by the online tool looks to be incorrect as well:

  #pragma omp parallel for schedule(runtime)
  for (int32_t i0 = 0; i0 < A1_dimension; i0++) {
    double* restrict workspace = 0;
    workspace = (double*)malloc(sizeof(double) * 42);
    for (int32_t pworkspace = 0; pworkspace < 42; pworkspace++) {
      workspace[pworkspace] = 0.0;
    }
    int32_t p0_cacheA = A2_pos[i0];
    int32_t pA2_end = A2_pos[(i0 + 1)];
    int32_t p0_cacheA0 = A2_crd[p0_cacheA];
    int32_t p0 = A2_crd[p0_cacheA];
    int32_t p0_cache = p0;
    int32_t p0_cache_end = C4_dimension;

    while (p0_cacheA < pA2_end && p0 < p0_cache_end) {
      p0_cacheA0 = A2_crd[p0_cacheA];
      p0 = A2_crd[p0_cacheA];
      workspace[p0_cache] = A_vals[p1A] * B_vals[jB];

      p0_cacheA += (int32_t)(p0_cacheA0 == p0);
      p0_cacheA0 = A2_crd[p0_cacheA];
      p0 = A2_crd[p0_cacheA];
    }
    for (int32_t p0 = 0; p0 < C4_dimension; p0++) {
      for (int32_t i1 = 0; i1 < A2_dimension; i1++) {
        int32_t i1C = i0 * C2_dimension + i1;
        for (int32_t p1 = 0; p1 < B2_dimension; p1++) {
          for (int32_t j = 0; j < B3_dimension; j++) {
            int32_t jC = i1C * C3_dimension + j;
            C_vals[jC] = C_vals[jC] + workspace[p0workspace];
          }
        }
      }
    }
    free(workspace);
  }

For instance, the line C_vals[jC] = C_vals[jC] + workspace[p0workspace]; for instances references a p0workspace that doesn't exist. The online tool simply wasn't catching this bug because it was built as a release build (which disables some internal assertions) rather than a debug build.

@stephenchouca stephenchouca added the bug Indicates an unexpected problem or unintended behavior label Dec 14, 2020
@Infinoid
Copy link
Contributor

I've been seeing missing local variables with precompute too. I ran into it with MTTKRP, but managed to boil it down to this:

taco 'A(i,j) = B(i,j) * 5' -f=A:dd -f=B:dd -s='precompute(B(i,j)*5,j,k)'

Which generates this loop:

  for (int32_t i = 0; i < B1_dimension; i++) {
    double* restrict workspace = 0;
    workspace = (double*)malloc(sizeof(double) * 42);
    for (int32_t pworkspace = 0; pworkspace < 42; pworkspace++) {
      workspace[pworkspace] = 0.0;
    }
    for (int32_t k = 0; k < A2_dimension; k++) {
      int32_t j = k;
      if (j >= A2_dimension)
        continue;

      int32_t kB = i * B2_dimension + k;
      workspace[k] = B_vals[kB] * 5;
    }
    for (int32_t j = 0; j < A2_dimension; j++) {
      A_vals[jA] = workspace[jworkspace];
    }
    free(workspace);
  }

And references the variable jA without ever declaring it.

@Infinoid
Copy link
Contributor

The missing variable thing is fixed. The original "Not lowerable" bug described by @roastduck is still present.

I tried to minimize that reproducer, here's the result:

$ bin/taco "C(i0)=A(i0,p0)*B(p0)" -s="precompute(A(i0,p0)*B(p0),i0,i0_cache)" 
terminate called after throwing an instance of 'taco::TacoException'
  what():  Compiler bug at .../src/lower/lower.cpp:50 in lower
Please report it to developers
 Condition failed: isLowerable(stmt, &reason)
 Not lowerable, because the index statement is not in concrete index notation, because all variables in concrete notation must be bound by a forall statement: where(forall(i0, where(C(i0) = tp0C, forall(p0, tp0C += workspace(i0)))), forall(i0_cache, workspace(i0_cache) = A(i0_cache,p0) * B(p0)))

Precomputing at i0 breaks, precomputing at p0 seems to work fine.

@weiya711
Copy link
Contributor

The branch taco/multidim-workspace (392fc66) seems to fix the following two commands from @Infinoid:

./bin/taco "C(i0)=A(i0,p0)*B(p0)" -s="precompute(A(i0,p0)*B(p0),p0,p0_cache)" 
./bin/taco "C(i0)=A(i0,p0)*B(p0)" -s="precompute(A(i0,p0)*B(p0),i0,i0_cache)" 

But it doesn't fix the command from @roastduck

taco "C(i0,i1,j)=A(i0,i1,p0,p1)*B(p0,p1,j)" -f="C:ddd:0,1,2" -f="A:dsdd:0,2,1,3" -f="B:ddd:0,1,2" -s="precompute(A(i0,i1,p0,p1)*B(p0,p1,j),p0,p0_cache)"

I'm a little confused about this larger command because currently when you precompute across p0 based on the precompute algorithm you must first reorder the index variables to {i0, i1, j, p0, p1} because the shared index variables {i0, i1, j} across the RHS and LHS of the assignment must be in the outer-loops of the precompute where statement.

In order to fix this the command should now have an added reorder(i0, i1, j, p0, p1) scheduling command shown below:

./bin/taco "C(i0,i1,j)=A(i0,i1,p0,p1)*B(p0,p1,j)" -f="C:ddd:0,1,2" -f="A:dsdd:0,2,1,3" -f="B:ddd:0,1,2" -s="reorder(i0, i1, j, p0, p1), precompute(A(i0,i1,p0,p1)*B(p0,p1,j),p0,p0_cache)"

This command also seems to generate correct code on the taco/multidim-workspace branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

4 participants