Skip to content

Schedule help #354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 20, 2021
Merged

Conversation

Infinoid
Copy link
Contributor

@Infinoid Infinoid commented Jan 5, 2021

Adds the obvious -help option, which prints the same usage information that you get when running taco with no options or unrecognized options.

It also adds a -help=scheduling option, which prints a list of scheduling commands and descriptions. The descriptions are mostly borrowed from the website.

Here's what the output looks like:

% bin/taco -help=scheduling
Scheduling commands modify the execution of the index expression.
The '-s' parameter specifies one or more scheduling commands.
Schedules are additive; more commands can be passed by separating
them with commas, or passing multiple '-s' parameters.

Examples:
  -s="precompute(A(i,j)*x(j),i,i)"
  -s="split(i,i0,i1,32),parallelize(i0,CPUThread,NoRaces)"

See http://tensor-compiler.org/docs/scheduling/index.html for more examples.

Commands:
  -s=pos(i, ipos, tensor)     Takes in an index variable `i` that iterates 
                              over the coordinate space of `tensor` and 
                              replaces it with a derived index variable `ipos` 
                              that iterates over the same iteration range, but 
                              with respect to the the position space. The 
                              `pos` transformation is not valid for dense 
                              level formats. 

  -s=fuse(i, j, f)            Takes in two index variables `i` and `j`, where 
                              `j` is directly nested under `i`, and collapses 
                              them into a fused index variable `f` that 
                              iterates over the product of the coordinates `i` 
                              and `j`. 

  -s=split(i, i0, i1, factor) Splits (strip-mines) an index variable `i` into 
                              two nested index variables `i0` and `i1`. The 
                              size of the inner index variable `i1` is then 
                              held constant at `factor`, which must be a 
                              positive integer. 

  -s=precompute(expr, i, iw)  Leverages scratchpad memories and reorders 
                              computations to increase locality. Given a 
                              subexpression `expr` to precompute, an index 
                              variable `i` to precompute over, and an index 
                              variable `iw` (which can be the same or 
                              different as `i`) to precompute with, the 
                              precomputed results are stored in a temporary 
                              tensor variable. 

  -s=reorder(i1, i2, ...)     Takes in a new ordering for a set of index 
                              variables in the expression that are directly 
                              nested in the iteration order. The indexes are 
                              ordered from outermost to innermost. 

  -s=bound(i, ib, b, type)    Replaces an index variable `i` with an index 
                              variable `ib` that obeys a compile-time 
                              constraint on its iteration space, incorporating 
                              knowledge about the size or structured sparsity 
                              pattern of the corresponding input. The meaning 
                              of `b` depends on the `type`. Possible bound 
                              types are: MinExact, MinConstraint, MaxExact, 
                              MaxConstraint. 

  -s=unroll(index, factor)    Unrolls the loop corresponding to an index 
                              variable `i` by `factor` number of iterations, 
                              where `factor` is a positive integer. 

  -s=parallelize(i, u, strat) tags an index variable `i` for parallel 
                              execution on hardware type `u`. Data races are 
                              handled by an output race strategy `strat`. 
                              Since the other transformations expect serial 
                              code, parallelize must come last in a series of 
                              transformations. Possible parallel hardware 
                              units are: NotParallel, GPUBlock, GPUWarp, 
                              GPUThread, CPUThread, CPUVector. Possible output 
                              race strategies are: IgnoreRaces, NoRaces, 
                              Atomics, Temporary, ParallelReduction. 

I added a mention to the -help text for -s that more info is available with -help=scheduling.

@Infinoid Infinoid marked this pull request as ready for review January 14, 2021 12:21
@stephenchouca stephenchouca merged commit dafe2ba into tensor-compiler:master Jan 20, 2021
@Infinoid Infinoid deleted the schedule-help branch January 21, 2021 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants