Skip to content

Advanced task dependencies / caching #461

Open
@bitprophet

Description

@bitprophet

Use case description

Core desire is to strengthen relationships between tasks above/beyond what the existing pre/post functionality achieves; and to structure a nontrivial set of dependent state as a graph of tasks which only execute as many times as is required to set up that state.

Specific example is an existing nontrivial task tree around performing cloud automation, configuration management, and similar things. Many/most tasks in this tree are currently wrapped with a @state decorator whose body does a large amount of "heavy lifting":

  • processes pre-existing configuration directives, including aborting if they aren't defined
    • A reminiscent predecessor is the old and (as far as I can tell) barely used fabric.api.require
  • instantiate network and API clients using that early configuration (and/or pulling from a secrets storage API)
  • generate new configuration settings based on runtime information (including data gathered from the above clients)
  • attempts at each step to perform idempotency, including skipping expensive operations if necessary (so no re-connecting to already-connected APIs; no re-reading already-read files; etc.)

At the least, this decorator "wants" to be split up into a bunch of smaller parts, each of which are connected by what state they themselves depend on, with the outermost/topmost caller (i.e. the decorated CLI task) specifying more precisely which bits they need. (This already exists in part with some args passed into the decorator, e.g. @state(limited=True) skips everything but the first few config bits and an early API client.)

Those smaller parts would then be capable of memoizing their results, so that e.g. if they are declared to "generate a value for config path c.aws.clients.vpc", and that config path already exists/is non-empty when something starts calling that function, it simply records this fact and short-circuits.

Going deeper?

Presumably one could expand this concept all the way to abstract things like "task A requires an admin cloud instance and a database" - including some sort of maybe registry-driven "find me some task that satisfies needs XYZ, cuz I need X Y and Z" setup. Which feels fraught and complex to me, and is probably a distinct API feel from the "do exactly what I say but feel free to skip anything you already did" described above.

Another approach could be something like @requires(file='/path/to/artifact', satisfied_by=build) where build is a task that presumably generates /path/to/artifact; this then becomes "just" a more rigorous @task(pre=[build]), and places the burden of the "is it already done?" test on the higher-level task instead of the lower-level one. Again, different API feel; and not the best, because if N tasks need /path/to/artifact they all have to specify this; and satisifed_by= implies that maybe M subtasks could generate the artifact, which feels specious.

Basically, each time I think of this side of it, I come back to: no, I really just want to specify that build_my_env depends on some half dozen other actual tasks, each of which wants to be memoizing or otherwise skipping already performed work. That gets us a very make-like setup, especially when you consider make is effectively just our existing pre-task functionality, except with the wrinkle that file paths "are" task names.

Solutions brainstorm

User-facing API

Note: was originally pondering additional decorators, but now think additional task kwargs makes more sense, see below comments. The name brainstorm is largely the same either way.

The actual needs are:

  • Ways to specify tasks that come before, and after, a task being defined.
  • Needs a plural noun ("this task has many _whatever_s"), a verb ("this task _something_s another task" or "that other task _something_s" this one") and a keyword argument (which is often a plural noun or a verb, but can also be some other part of speech, such as an adverb (afterwards=).
  • Those parts of speech should hopefully be thematically close; the farther apart they are, the more mental overhead is required.
  • Ideally we want shorter names and ones that are easy to type: 2-3 syllables is preferable over 4+, doubled-up letters aren't great, underscores aren't awful but are non-ideal, etc.

Specifying dependencies/prerequisites

  • before: Far too ambiguous re: whether the task being defined is the subject or object ("this other task comes before me" vs "I come before this other task".)
  • but_first: Very English but I think way over the line in terms of being cute/twee. Would pair well with and_then/then.
  • depend_on: meh
  • depends: just okay; shorter than depends_on but doesn't flow as well English-wise
  • depends_on: kinda works
  • dependencies: straightforward if slightly long. Also works great as double duty for the plural noun (though we'll probably use it for that regardless.)
  • first, as in, "first run these, then run me." Very English, possibly too cute, and possibly ambiguous in the sense that the term is pretty overloaded (though I can't immediately think of what else in an Invoke context one might confuse it with.) Really wants to pair up with then or and_then and maybe later.
  • needs: can't really explain why, but not a huge fan. Might work as plural noun and as verb though - a task "needs" another, a task "has needs", etc.
  • pre: i.e. how things work now. Somewhat ambiguous ("These others run pre-me" vs "I come 'pre' these others") and kinda ugly, but it is short.
  • preceded_by: Straightforward and unambiguous but kind of hell to type. Noun would be 'predecessors' presumably.
  • prerequisites: This is the term make uses. It's sensible but a bit long/annoying to type, and it only really pairs well with postrequisites which, while accurate, feels even more awkward for some reason. Works well as plural noun; verb's akward ("task A prerequires task B"?)
  • previously: "Previously, you should have run these tasks". Kinda awkward, this isn't TV.
  • prior: "You should run these prior to running me." Not awful, but also no good plural noun ("priors"? Awkward and legalese-sounding to me) and the 'verb' of "comes prior to" is a bit wordy.
  • requires: pretty solid, also works as verb, with "requirements" as the collective noun.
  • succeeds: Pairs well with precedes, but unfortunately an overloaded term (could be mistaken for "what to run if this task succeeds, as in, does not fail") so not great.

Specifying followups/postrequisites

  • after: too ambiguous, is it "Run these after me" or "Run me after these"?
  • after_success / after_failure: As commented below I'm not a huge fan of the conditional-trigger approach, but if we did implement that concept, these could work (the addition of success/failure removes the ambiguity of just after=.) Bit long though.
  • afterwards: Not bad, slightly cutesy, but unambiguous ("run me, then run these afterwards", but you'd never say "run me afterwards these other tasks".) None of the after* names have good plural nouns ("aftertasks"? Ugh) though the verb is easy, comes after.
  • and_then: Cutesy, but highly unambiguous, and not too long. Pairs well with first/but_first/etc.
  • consequences: Kinda dark (in English we usually use this to mean negative consequences) but clinically accurate? kwarg alternative could also be consequently=.
  • consumers: "These other tasks are consumers of what I do." Not the worst, but I feel it's not as general as should be, not all task relationships necessarily imply production/consumption of data, even if many do. Would work as plural noun and (via "consumes") verb. Might be the best plural noun (which is the real hard part of this side of things.)
  • drives (as in, "I drive execution of these other tasks"): mediocre at best; verb; but no good plural noun (uh..."drivees"? No. And don't even think about saying "passengers".)
  • enables: halfway decent but doesn't really cut it, because we're not saying "you can run these after me", we're trying to say "you must run these after me". Permission != command.
  • followup(s): currently what my doc brainstorm is using but I'm not in love with it. I'm using it as the plural noun as well ("followups" and/or "followup tasks"), but that's awkward. The whole family of follow* options has the same issue.
  • follow_with: better English but also longer
  • followed_by: even more natural-sounding than follow_with, though still long
  • later: Not the worst, tho not great. "Run these tasks later, after you run me." Maybe ambiguous ("I come later than these other tasks") but not as awful as some others. Pairs well with first.
  • leads_to: Half decent, is verb, no good obvious noun though.
  • next: Similar to later. "Do this next." (Also not accurate since there is no guarantee the requested tasks will actually run next!)
  • notifies: Like Chef, kinda, and acts as a kwarg and a verb, but sadly lacks a good forward-reference noun ("notification targets"? ugh).
  • post: i.e. how it works now. Same ambiguity, ugly, etc problems as with pre above.
  • post-tasks: same except more of a plural noun focus.
  • postrequisites: opposite of prerequisites, but as noted there, seems a bit long/awkward. Decent plural noun, and (via "postrequires") a very awkward verb.
  • precedes: Matches up with succeeds, but otherwise a bit awkward IMO. Is already the verb. Noun: uh...kind of wants to be "successors" really?
  • subsequently: More-Englishy adaptation of subsequents below ("run me, then subsequently, run these other tasks") but also a little twee. Same noun/verb as below.
  • subsequents: Slightly awkward but not the worst, works as a plural noun plus kwarg. Suppose the verb usage would be "is subsequent to"?
  • succeeded_by: Clear, but annoying to type on top of just being long. Verb would be "succeeds", as in, "these other tasks succeed this one".
  • successors: Similar to succeeded_by and also works as the plural noun. Verb same as above ("succeed"/"succeeds".) Also still awkward to type.
  • then: similar to and_then, but even shorter (but also even more cutesy.) Really wants to be paired with first, which I don't really like.
  • triggers: seems nice at a glance ("I trigger these tasks afterwards") but secretly ambiguous as it can also be interpreted as "these trigger me" and in fact demands to be seen that way if used as a plural noun. So, no good.

Skipping-execution checks

  • checks: almost certainly going with this as it's straightforward either way you read it: "these are my checks (check functions)" or "this task checks to see if these things are already satisfied".
  • creates: not great because so many state checks could be about things other than "did something get created".
  • generates: same as creates
  • supplies: awkward

Jeff's thoughts

The hardest part by far seems to be the plural noun for tasks which come after the current one, so we should start there.

The only ones that have come up so far that aren't problematic are: "after-tasks", "consequences", "post-tasks", "followups", "postrequisites", "subsequents", "successors". None of these are immediate "yeah!"s so let's see how they stack up re: the criteria listed up top.

  • "consequences":
    • Thematically close to: nothing really? works equally well with most tho.
    • "A's consequences", "B is a consequence of A", @task(consequences=[notify])
    • Short? Not really
    • Easy to type? Aside from length, it's only moderately awkward to type...
    • Verdict: I don't really like it but it's not super bad.
  • "successors": regal
    • Thematically close to: "predecessors"; also works to describe proceeds/succeeds (tho latter is ambiguous and thus not great)
    • "A's successors", "B succeeds A", @task(successors=[notify])
    • Short? 3 syllables, ok.
    • Easy to type? Not really...
    • Verdict: Feels like the awkward typing (especially of its only good related terms) kinda kills it
  • "subsequents": clinical
    • Thematically close to: most half-decent plural nouns (dependencies, predecessors, prerequisites) are okay, but none work great.
    • "A's subsequents", "B is subsequent to A", @task(subsequents=[notify])
    • Short? 3 syllables, ok.
    • Easy to type? Somewhat, tho not great.
    • Verdict: Suspect still too awkward for everyday use but not 100% convinced of that.
  • "followups": folksy
    • Thematically close to: not super close to any but "dependencies" seems to work fine?
    • "A's followups", "B follows[ up] A", @task(followups=[notify])
    • Short? 3 syllables
    • Easy to type? Yes!
    • Verdict: just okay.
  • "postrequisites": also clinical
    • Thematically close to: "prerequisites" is a perfect match obviously
    • "A's postrequisites", "B post-requires A", @task(postrequisites=[notify])
    • Short? Nah, 4 syllables
    • Easy to type? Nope.
    • Verdict: meh.
  • "after-tasks": vaguely...Carroll-esque? idk
    • Thematically close to: nothing is super close; pre-tasks, dependencies. Unfortunately neither of these "match" well as a logical inverse of "afterwards". The main English opposite would be "beforehand"? which seems too awkward.
    • "A's after-tasks", "B comes after A", @task(afterwards=[notify])
    • Short? 3 syllables
    • Easy to type? Yes.
    • Verdict: Not awful but still a bit silly.

Implementation

  • Naive version is to have a 'checker' aspect that confirms, each time, whether the desired result already exists. Check for existence of file, of config value, of cloud resource, etc.
    • Could include a bunch of 'standard' versions of these, with the generalized case just being "any callable"
      • Or some richer value, maybe, perhaps
  • As DAG for a given task exec grows and multiple sub(-sub-sub-sub)-tasks all depend on the same low level bits, that doesn't scale well because you're still e.g. hitting disk, remote system, API call, etc, dozens of times unnecessarily, even if the check action tends to be much faster than the "make the thing" action.
  • So we could formalize that "if the task has been called already this interpreter session, assume what it does is satisfied". Task already has basic (if not well trod) call count tracking so this could be easy to do.

Related, possibly subsumed tickets:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions