Schedule help #354

Infinoid · 2021-01-05T15:50:13Z

Adds the obvious -help option, which prints the same usage information that you get when running taco with no options or unrecognized options.

It also adds a -help=scheduling option, which prints a list of scheduling commands and descriptions. The descriptions are mostly borrowed from the website.

Here's what the output looks like:

% bin/taco -help=scheduling
Scheduling commands modify the execution of the index expression.
The '-s' parameter specifies one or more scheduling commands.
Schedules are additive; more commands can be passed by separating
them with commas, or passing multiple '-s' parameters.

Examples:
  -s="precompute(A(i,j)*x(j),i,i)"
  -s="split(i,i0,i1,32),parallelize(i0,CPUThread,NoRaces)"

See http://tensor-compiler.org/docs/scheduling/index.html for more examples.

Commands:
  -s=pos(i, ipos, tensor)     Takes in an index variable `i` that iterates 
                              over the coordinate space of `tensor` and 
                              replaces it with a derived index variable `ipos` 
                              that iterates over the same iteration range, but 
                              with respect to the the position space. The 
                              `pos` transformation is not valid for dense 
                              level formats. 

  -s=fuse(i, j, f)            Takes in two index variables `i` and `j`, where 
                              `j` is directly nested under `i`, and collapses 
                              them into a fused index variable `f` that 
                              iterates over the product of the coordinates `i` 
                              and `j`. 

  -s=split(i, i0, i1, factor) Splits (strip-mines) an index variable `i` into 
                              two nested index variables `i0` and `i1`. The 
                              size of the inner index variable `i1` is then 
                              held constant at `factor`, which must be a 
                              positive integer. 

  -s=precompute(expr, i, iw)  Leverages scratchpad memories and reorders 
                              computations to increase locality. Given a 
                              subexpression `expr` to precompute, an index 
                              variable `i` to precompute over, and an index 
                              variable `iw` (which can be the same or 
                              different as `i`) to precompute with, the 
                              precomputed results are stored in a temporary 
                              tensor variable. 

  -s=reorder(i1, i2, ...)     Takes in a new ordering for a set of index 
                              variables in the expression that are directly 
                              nested in the iteration order. The indexes are 
                              ordered from outermost to innermost. 

  -s=bound(i, ib, b, type)    Replaces an index variable `i` with an index 
                              variable `ib` that obeys a compile-time 
                              constraint on its iteration space, incorporating 
                              knowledge about the size or structured sparsity 
                              pattern of the corresponding input. The meaning 
                              of `b` depends on the `type`. Possible bound 
                              types are: MinExact, MinConstraint, MaxExact, 
                              MaxConstraint. 

  -s=unroll(index, factor)    Unrolls the loop corresponding to an index 
                              variable `i` by `factor` number of iterations, 
                              where `factor` is a positive integer. 

  -s=parallelize(i, u, strat) tags an index variable `i` for parallel 
                              execution on hardware type `u`. Data races are 
                              handled by an output race strategy `strat`. 
                              Since the other transformations expect serial 
                              code, parallelize must come last in a series of 
                              transformations. Possible parallel hardware 
                              units are: NotParallel, GPUBlock, GPUWarp, 
                              GPUThread, CPUThread, CPUVector. Possible output 
                              race strategies are: IgnoreRaces, NoRaces, 
                              Atomics, Temporary, ParallelReduction.

I added a mention to the -help text for -s that more info is available with -help=scheduling.

Infinoid force-pushed the schedule-help branch from 1beb4ef to 37b2a7b Compare January 13, 2021 12:26

Add -help and -help=schedule parameters to CLI

023f1c1

Infinoid force-pushed the schedule-help branch from 37b2a7b to 023f1c1 Compare January 14, 2021 12:15

Infinoid marked this pull request as ready for review January 14, 2021 12:21

stephenchouca merged commit dafe2ba into tensor-compiler:master Jan 20, 2021

Infinoid deleted the schedule-help branch January 21, 2021 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schedule help #354

Schedule help #354

Infinoid commented Jan 5, 2021 •

edited

Loading

Schedule help #354

Schedule help #354

Conversation

Infinoid commented Jan 5, 2021 • edited Loading

Infinoid commented Jan 5, 2021 •

edited

Loading