Advanced task dependencies / caching

## Use case description

Core desire is to strengthen relationships between tasks above/beyond what the existing pre/post functionality achieves; and to structure a nontrivial set of dependent state as a graph of tasks which only execute as many times as is required to set up that state.

Specific example is an existing nontrivial task tree around performing cloud automation, configuration management, and similar things. Many/most tasks in this tree are currently wrapped with a `@state` decorator whose body does a large amount of "heavy lifting":

- processes pre-existing configuration directives, including aborting if they aren't defined
  - A reminiscent predecessor is the old and (as far as I can tell) barely used [fabric.api.require](http://docs.fabfile.org/en/1.13/api/core/operations.html#fabric.operations.require)
- instantiate network and API clients using that early configuration (and/or pulling from a secrets storage API)
- generate new configuration settings based on runtime information (including data gathered from the above clients)
- attempts at each step to perform idempotency, including skipping expensive operations if necessary (so no re-connecting to already-connected APIs; no re-reading already-read files; etc.)

At the least, this decorator "wants" to be split up into a bunch of smaller parts, each of which are connected by what state they themselves depend on, with the outermost/topmost caller (i.e. the decorated CLI task) specifying more precisely which bits they need. (This already exists in part with some args passed into the decorator, e.g. `@state(limited=True)` skips everything but the first few config bits and an early API client.)

Those smaller parts would then be capable of memoizing their results, so that e.g. if they are declared to "generate a value for config path `c.aws.clients.vpc`", and that config path already exists/is non-empty when something starts calling that function, it simply records this fact and short-circuits.

### Going deeper?

Presumably one could expand this concept all the way to abstract things like "task A requires an admin cloud instance and a database" - including some sort of maybe registry-driven "find me some task that satisfies needs XYZ, cuz I need X Y and Z" setup. Which feels fraught and complex to me, and is probably a distinct API feel from the "do exactly what I say but feel free to skip anything you already did" described above.

Another approach could be something like `@requires(file='/path/to/artifact', satisfied_by=build)` where `build` is a task that presumably generates `/path/to/artifact`; this then becomes "just" a more rigorous `@task(pre=[build])`, and places the burden of the "is it already done?" test on the higher-level task instead of the lower-level one. Again, different API feel; and not the best, because if N tasks need `/path/to/artifact` they all have to specify this; and `satisifed_by=` implies that maybe M subtasks could generate the artifact, which feels specious.

Basically, each time I think of this side of it, I come back to: no, I really just want to specify that `build_my_env` depends on some half dozen other actual tasks, each of which wants to be memoizing or otherwise skipping already performed work. That gets us a very `make`-like setup, especially when you consider `make` is effectively just our existing pre-task functionality, except with the wrinkle that file paths "are" task names.

## Solutions brainstorm

### User-facing API

Note: was originally pondering additional decorators, but now think additional task kwargs makes more sense, see below comments. The name brainstorm is largely the same either way.

The actual needs are:

- Ways to specify tasks that come before, and after, a task being defined.
- Needs a plural noun ("this task has many _whatever_s"), a verb ("this task _something_s another task" or "that other task _something_s" this one") and a keyword argument (which is often a plural noun or a verb, but can also be some other part of speech, such as an adverb (`afterwards=`).
- Those parts of speech should hopefully be thematically close; the farther apart they are, the more mental overhead is required.
- Ideally we want shorter names and ones that are easy to type: 2-3 syllables is preferable over 4+, doubled-up letters aren't great, underscores aren't awful but are non-ideal, etc.

#### Specifying dependencies/prerequisites

- `before`: Far too ambiguous re: whether the task being defined is the subject or object ("this other task comes before me" vs "I come before this other task".)
- `but_first`: Very English but I think way over the line in terms of being cute/twee. Would pair well with `and_then`/`then`.
- `depend_on`: meh
- `depends`: just okay; shorter than `depends_on` but doesn't flow as well English-wise
- `depends_on`: kinda works
- `dependencies`: straightforward if slightly long. Also works great as double duty for the plural noun (though we'll probably use it for that regardless.)
- `first`, as in, "first run these, then run me." Very English, possibly too cute, and possibly ambiguous in the sense that the term is pretty overloaded (though I can't immediately think of what else in an Invoke context one might confuse it with.) Really wants to pair up with `then` or `and_then` and maybe `later`.
- `needs`: can't really explain why, but not a huge fan. Might work as plural noun _and_ as verb though - a task "needs" another, a task "has needs", etc.
- `pre`: i.e. how things work now. Somewhat ambiguous ("These others run pre-me" vs "I come 'pre' these others") and kinda ugly, but it is short.
- `preceded_by`: Straightforward and unambiguous but kind of hell to type. Noun would be 'predecessors' presumably.
- `prerequisites`: This is the term `make` uses. It's sensible but a bit long/annoying to type, and it only really pairs well with `postrequisites` which, while accurate, feels even more awkward for some reason. Works well as plural noun; verb's akward ("task A prerequires task B"?)
- `previously`: "Previously, you should have run these tasks". Kinda awkward, this isn't TV.
- `prior`: "You should run these prior to running me." Not awful, but also no good plural noun ("priors"? Awkward and legalese-sounding to me) and the 'verb' of "comes prior to" is a bit wordy.
- `requires`: pretty solid, also works as verb, with "requirements" as the collective noun.
- `succeeds`: Pairs well with `precedes`, but unfortunately an overloaded term (could be mistaken for "what to run if this task succeeds, as in, does not fail") so not great.

#### Specifying followups/postrequisites

- `after`: too ambiguous, is it "Run these after me" _or_ "Run me after these"?
- `after_success` / `after_failure`: As commented below I'm not a huge fan of the conditional-trigger approach, but if we did implement that concept, these _could_ work (the addition of success/failure removes the ambiguity of just `after=`.) Bit long though.
- `afterwards`: Not bad, slightly cutesy, but unambiguous ("run me, then run these afterwards", but you'd never say "run me afterwards these other tasks".) None of the `after*` names have good plural nouns ("aftertasks"? Ugh) though the verb is easy, _comes after_.
- `and_then`: Cutesy, but highly unambiguous, and not too long. Pairs well with `first`/`but_first`/etc.
- `consequences`: Kinda dark (in English we usually use this to mean _negative_ consequences) but clinically accurate? kwarg alternative could also be `consequently=`.
- `consumers`: "These other tasks are consumers of what I do." Not the worst, but I feel it's not as general as should be, not all task relationships necessarily imply production/consumption of data, even if many do. Would work as plural noun and (via "consumes") verb. Might be the best plural noun (which is the real hard part of this side of things.)
- `drives` (as in, "I drive execution of these other tasks"): mediocre at best; verb; but no good plural noun (uh..."drivees"? No. And don't even think about saying "passengers".)
- `enables`: halfway decent but doesn't really cut it, because we're not saying "you _can_ run these after me", we're trying to say "you _must_ run these after me". Permission != command.
- `followup(s)`: currently what my doc brainstorm is using but I'm not in love with it. I'm using it as the plural noun as well ("followups" and/or "followup tasks"), but that's awkward. The whole family of `follow*` options has the same issue.
- `follow_with`: better English but also longer
- `followed_by`: even more natural-sounding than `follow_with`, though still long
- `later`: Not the worst, tho not great. "Run these tasks later, after you run me." _Maybe_ ambiguous ("I come `later` than these other tasks") but not as awful as some others. Pairs well with `first`.
- `leads_to`: Half decent, is verb, no good obvious noun though.
- `next`: Similar to `later`. "Do this next." (Also not accurate since there is no guarantee the requested tasks _will_ actually run next!)
- `notifies`: Like Chef, kinda, and acts as a kwarg and a verb, but sadly lacks a good forward-reference noun ("notification targets"? ugh).
- `post`: i.e. how it works now. Same ambiguity, ugly, etc problems as with `pre` above.
- `post-tasks`: same except more of a plural noun focus.
- `postrequisites`: opposite of `prerequisites`, but as noted there, seems a bit long/awkward. Decent plural noun, and (via "postrequires") a very awkward verb.
- `precedes`: Matches up with `succeeds`, but otherwise a bit awkward IMO. Is already the verb. Noun: uh...kind of wants to be "successors" really?
- `subsequently`: More-Englishy adaptation of `subsequents` below ("run me, then subsequently, run these other tasks") but also a little twee. Same noun/verb as below.
- `subsequents`: Slightly awkward but not the worst, works as a plural noun plus kwarg. Suppose the verb usage would be "is subsequent to"?
- `succeeded_by`: Clear, but annoying to type on top of just being long. Verb would be "succeeds", as in, "these other tasks succeed this one".
- `successors`: Similar to `succeeded_by` and also works as the plural noun. Verb same as above ("succeed"/"succeeds".) Also still awkward to type.
- `then`: similar to `and_then`, but even shorter (but also even more cutesy.) Really wants to be paired with `first`, which I don't really like.
- `triggers`: seems nice at a glance ("I trigger these tasks afterwards") but secretly ambiguous as it can also be interpreted as "these trigger _me_" and in fact demands to be seen that way if used as a plural noun. So, no good.

#### Skipping-execution checks

- `checks`: almost certainly going with this as it's straightforward either way you read it: "these are my checks (check functions)" or "this task _checks_ to see if these things are already satisfied".
- `creates`: not great because so many state checks could be about things other than "did something get created".
- `generates`: same as `creates`
- `supplies`: awkward

#### Jeff's thoughts

The hardest part by far seems to be the plural noun for tasks which come after the current one, so we should start there.

The only ones that have come up so far that aren't problematic are: "after-tasks", "consequences", "post-tasks", "followups", "postrequisites", "subsequents", "successors". None of these are immediate "yeah!"s so let's see how they stack up re: the criteria listed up top.

- "consequences":
  - Thematically close to: nothing really? works equally well with most tho.
  - "A's consequences", "B is a consequence of A", `@task(consequences=[notify])`
  - Short? Not really
  - Easy to type? Aside from length, it's only moderately awkward to type...
  - Verdict: I don't really like it but it's not super bad.
- "successors": regal
  - Thematically close to: "predecessors"; also works to describe `proceeds`/`succeeds` (tho latter is ambiguous and thus not great)
  - "A's successors", "B succeeds A", `@task(successors=[notify])`
  - Short? 3 syllables, ok.
  - Easy to type? Not really...
  - Verdict: Feels like the awkward typing (especially of its only good related terms) kinda kills it
- "subsequents": clinical 
  - Thematically close to: most half-decent plural nouns (dependencies, predecessors, prerequisites) are okay, but none work great.
  - "A's subsequents", "B is subsequent to A", `@task(subsequents=[notify])`
  - Short? 3 syllables, ok.
  - Easy to type? Somewhat, tho not great.
  - Verdict: Suspect still too awkward for everyday use but not 100% convinced of that.
- "followups": folksy
  - Thematically close to: not super close to any but "dependencies" seems to work fine?
  - "A's followups", "B follows[ up]  A", `@task(followups=[notify])`
  - Short? 3 syllables
  - Easy to type? Yes!
  - Verdict: just okay.
- "postrequisites": also clinical
  - Thematically close to: "prerequisites" is a perfect match obviously
  - "A's postrequisites", "B post-requires A", `@task(postrequisites=[notify])`
  - Short? Nah, 4 syllables
  - Easy to type? Nope.
  - Verdict: meh.
- "after-tasks": vaguely...Carroll-esque? idk
  - Thematically close to: nothing is super close; pre-tasks, dependencies. Unfortunately neither of these "match" well as a logical inverse of "afterwards". The main English opposite would be "beforehand"? which seems too awkward.
  - "A's after-tasks", "B comes after A", `@task(afterwards=[notify])`
  - Short? 3 syllables
  - Easy to type? Yes.
  - Verdict: Not awful but still a bit silly.

### Implementation

- Naive version is to have a 'checker' aspect that confirms, each time, whether the desired result already exists. Check for existence of file, of config value, of cloud resource, etc.
  - Could include a bunch of 'standard' versions of these, with the generalized case just being "any callable"
    - Or some richer value, maybe, perhaps
- As DAG for a given task exec grows and multiple sub(-sub-sub-sub)-tasks all depend on the same low level bits, that doesn't scale well because you're still e.g. hitting disk, remote system, API call, etc, dozens of times unnecessarily, even if the check action tends to be much faster than the "make the thing" action.
- So we could formalize that "if the task has been called already this interpreter session, assume what it does is satisfied". `Task` already has basic (if not well trod) call count tracking so this could be easy to do.

----

Related, possibly subsumed tickets:

- #41 - the original dependencies ticket, long closed
- #45 - this is the closest match, and arguably a duplicate, though I interpret that ticket to be more "make the existing pre/post deduplicating use a DAG", whereas this ticket is about extending pre/post itself to have a richer API, even if that probably requires the DAG anyways.
- #100 - if we grow the ability to say "this task should generate <file X>", it makes a ton of sense to slightly extend that to "this task should generate <file X>, regenerating if <file X> exists but is old"
- #170 - insofar as calling tasks from other tasks definitely wants to honor this new functionality
- #228 - dependency execution needs to mesh well with parameterization, their intersection is historically easy to screw up / paint oneself into a corner
- #261 - since this means doubling down on use of pre-tasks as a common thing, it means we really gotta figure out the signature mismatch problem between the main task and its pre-tasks (and their pre-tasks and ...)
- #298 (and/or its PR #299) - at least one user is using post-tasks, and noticed the existing behavior is unexpected re: overall flow of pre, main, post tasks. Something that should probably be solvable during this ticket.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Advanced task dependencies / caching #461

Use case description

Going deeper?

Solutions brainstorm

User-facing API

Specifying dependencies/prerequisites

Specifying followups/postrequisites

Skipping-execution checks

Jeff's thoughts

Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Advanced task dependencies / caching #461

Description

Use case description

Going deeper?

Solutions brainstorm

User-facing API

Specifying dependencies/prerequisites

Specifying followups/postrequisites

Skipping-execution checks

Jeff's thoughts

Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions