Description
Use case description
Core desire is to strengthen relationships between tasks above/beyond what the existing pre/post functionality achieves; and to structure a nontrivial set of dependent state as a graph of tasks which only execute as many times as is required to set up that state.
Specific example is an existing nontrivial task tree around performing cloud automation, configuration management, and similar things. Many/most tasks in this tree are currently wrapped with a @state
decorator whose body does a large amount of "heavy lifting":
- processes pre-existing configuration directives, including aborting if they aren't defined
- A reminiscent predecessor is the old and (as far as I can tell) barely used fabric.api.require
- instantiate network and API clients using that early configuration (and/or pulling from a secrets storage API)
- generate new configuration settings based on runtime information (including data gathered from the above clients)
- attempts at each step to perform idempotency, including skipping expensive operations if necessary (so no re-connecting to already-connected APIs; no re-reading already-read files; etc.)
At the least, this decorator "wants" to be split up into a bunch of smaller parts, each of which are connected by what state they themselves depend on, with the outermost/topmost caller (i.e. the decorated CLI task) specifying more precisely which bits they need. (This already exists in part with some args passed into the decorator, e.g. @state(limited=True)
skips everything but the first few config bits and an early API client.)
Those smaller parts would then be capable of memoizing their results, so that e.g. if they are declared to "generate a value for config path c.aws.clients.vpc
", and that config path already exists/is non-empty when something starts calling that function, it simply records this fact and short-circuits.
Going deeper?
Presumably one could expand this concept all the way to abstract things like "task A requires an admin cloud instance and a database" - including some sort of maybe registry-driven "find me some task that satisfies needs XYZ, cuz I need X Y and Z" setup. Which feels fraught and complex to me, and is probably a distinct API feel from the "do exactly what I say but feel free to skip anything you already did" described above.
Another approach could be something like @requires(file='/path/to/artifact', satisfied_by=build)
where build
is a task that presumably generates /path/to/artifact
; this then becomes "just" a more rigorous @task(pre=[build])
, and places the burden of the "is it already done?" test on the higher-level task instead of the lower-level one. Again, different API feel; and not the best, because if N tasks need /path/to/artifact
they all have to specify this; and satisifed_by=
implies that maybe M subtasks could generate the artifact, which feels specious.
Basically, each time I think of this side of it, I come back to: no, I really just want to specify that build_my_env
depends on some half dozen other actual tasks, each of which wants to be memoizing or otherwise skipping already performed work. That gets us a very make
-like setup, especially when you consider make
is effectively just our existing pre-task functionality, except with the wrinkle that file paths "are" task names.
Solutions brainstorm
User-facing API
Note: was originally pondering additional decorators, but now think additional task kwargs makes more sense, see below comments. The name brainstorm is largely the same either way.
The actual needs are:
- Ways to specify tasks that come before, and after, a task being defined.
- Needs a plural noun ("this task has many _whatever_s"), a verb ("this task _something_s another task" or "that other task _something_s" this one") and a keyword argument (which is often a plural noun or a verb, but can also be some other part of speech, such as an adverb (
afterwards=
). - Those parts of speech should hopefully be thematically close; the farther apart they are, the more mental overhead is required.
- Ideally we want shorter names and ones that are easy to type: 2-3 syllables is preferable over 4+, doubled-up letters aren't great, underscores aren't awful but are non-ideal, etc.
Specifying dependencies/prerequisites
before
: Far too ambiguous re: whether the task being defined is the subject or object ("this other task comes before me" vs "I come before this other task".)but_first
: Very English but I think way over the line in terms of being cute/twee. Would pair well withand_then
/then
.depend_on
: mehdepends
: just okay; shorter thandepends_on
but doesn't flow as well English-wisedepends_on
: kinda worksdependencies
: straightforward if slightly long. Also works great as double duty for the plural noun (though we'll probably use it for that regardless.)first
, as in, "first run these, then run me." Very English, possibly too cute, and possibly ambiguous in the sense that the term is pretty overloaded (though I can't immediately think of what else in an Invoke context one might confuse it with.) Really wants to pair up withthen
orand_then
and maybelater
.needs
: can't really explain why, but not a huge fan. Might work as plural noun and as verb though - a task "needs" another, a task "has needs", etc.pre
: i.e. how things work now. Somewhat ambiguous ("These others run pre-me" vs "I come 'pre' these others") and kinda ugly, but it is short.preceded_by
: Straightforward and unambiguous but kind of hell to type. Noun would be 'predecessors' presumably.prerequisites
: This is the termmake
uses. It's sensible but a bit long/annoying to type, and it only really pairs well withpostrequisites
which, while accurate, feels even more awkward for some reason. Works well as plural noun; verb's akward ("task A prerequires task B"?)previously
: "Previously, you should have run these tasks". Kinda awkward, this isn't TV.prior
: "You should run these prior to running me." Not awful, but also no good plural noun ("priors"? Awkward and legalese-sounding to me) and the 'verb' of "comes prior to" is a bit wordy.requires
: pretty solid, also works as verb, with "requirements" as the collective noun.succeeds
: Pairs well withprecedes
, but unfortunately an overloaded term (could be mistaken for "what to run if this task succeeds, as in, does not fail") so not great.
Specifying followups/postrequisites
after
: too ambiguous, is it "Run these after me" or "Run me after these"?after_success
/after_failure
: As commented below I'm not a huge fan of the conditional-trigger approach, but if we did implement that concept, these could work (the addition of success/failure removes the ambiguity of justafter=
.) Bit long though.afterwards
: Not bad, slightly cutesy, but unambiguous ("run me, then run these afterwards", but you'd never say "run me afterwards these other tasks".) None of theafter*
names have good plural nouns ("aftertasks"? Ugh) though the verb is easy, comes after.and_then
: Cutesy, but highly unambiguous, and not too long. Pairs well withfirst
/but_first
/etc.consequences
: Kinda dark (in English we usually use this to mean negative consequences) but clinically accurate? kwarg alternative could also beconsequently=
.consumers
: "These other tasks are consumers of what I do." Not the worst, but I feel it's not as general as should be, not all task relationships necessarily imply production/consumption of data, even if many do. Would work as plural noun and (via "consumes") verb. Might be the best plural noun (which is the real hard part of this side of things.)drives
(as in, "I drive execution of these other tasks"): mediocre at best; verb; but no good plural noun (uh..."drivees"? No. And don't even think about saying "passengers".)enables
: halfway decent but doesn't really cut it, because we're not saying "you can run these after me", we're trying to say "you must run these after me". Permission != command.followup(s)
: currently what my doc brainstorm is using but I'm not in love with it. I'm using it as the plural noun as well ("followups" and/or "followup tasks"), but that's awkward. The whole family offollow*
options has the same issue.follow_with
: better English but also longerfollowed_by
: even more natural-sounding thanfollow_with
, though still longlater
: Not the worst, tho not great. "Run these tasks later, after you run me." Maybe ambiguous ("I comelater
than these other tasks") but not as awful as some others. Pairs well withfirst
.leads_to
: Half decent, is verb, no good obvious noun though.next
: Similar tolater
. "Do this next." (Also not accurate since there is no guarantee the requested tasks will actually run next!)notifies
: Like Chef, kinda, and acts as a kwarg and a verb, but sadly lacks a good forward-reference noun ("notification targets"? ugh).post
: i.e. how it works now. Same ambiguity, ugly, etc problems as withpre
above.post-tasks
: same except more of a plural noun focus.postrequisites
: opposite ofprerequisites
, but as noted there, seems a bit long/awkward. Decent plural noun, and (via "postrequires") a very awkward verb.precedes
: Matches up withsucceeds
, but otherwise a bit awkward IMO. Is already the verb. Noun: uh...kind of wants to be "successors" really?subsequently
: More-Englishy adaptation ofsubsequents
below ("run me, then subsequently, run these other tasks") but also a little twee. Same noun/verb as below.subsequents
: Slightly awkward but not the worst, works as a plural noun plus kwarg. Suppose the verb usage would be "is subsequent to"?succeeded_by
: Clear, but annoying to type on top of just being long. Verb would be "succeeds", as in, "these other tasks succeed this one".successors
: Similar tosucceeded_by
and also works as the plural noun. Verb same as above ("succeed"/"succeeds".) Also still awkward to type.then
: similar toand_then
, but even shorter (but also even more cutesy.) Really wants to be paired withfirst
, which I don't really like.triggers
: seems nice at a glance ("I trigger these tasks afterwards") but secretly ambiguous as it can also be interpreted as "these trigger me" and in fact demands to be seen that way if used as a plural noun. So, no good.
Skipping-execution checks
checks
: almost certainly going with this as it's straightforward either way you read it: "these are my checks (check functions)" or "this task checks to see if these things are already satisfied".creates
: not great because so many state checks could be about things other than "did something get created".generates
: same ascreates
supplies
: awkward
Jeff's thoughts
The hardest part by far seems to be the plural noun for tasks which come after the current one, so we should start there.
The only ones that have come up so far that aren't problematic are: "after-tasks", "consequences", "post-tasks", "followups", "postrequisites", "subsequents", "successors". None of these are immediate "yeah!"s so let's see how they stack up re: the criteria listed up top.
- "consequences":
- Thematically close to: nothing really? works equally well with most tho.
- "A's consequences", "B is a consequence of A",
@task(consequences=[notify])
- Short? Not really
- Easy to type? Aside from length, it's only moderately awkward to type...
- Verdict: I don't really like it but it's not super bad.
- "successors": regal
- Thematically close to: "predecessors"; also works to describe
proceeds
/succeeds
(tho latter is ambiguous and thus not great) - "A's successors", "B succeeds A",
@task(successors=[notify])
- Short? 3 syllables, ok.
- Easy to type? Not really...
- Verdict: Feels like the awkward typing (especially of its only good related terms) kinda kills it
- Thematically close to: "predecessors"; also works to describe
- "subsequents": clinical
- Thematically close to: most half-decent plural nouns (dependencies, predecessors, prerequisites) are okay, but none work great.
- "A's subsequents", "B is subsequent to A",
@task(subsequents=[notify])
- Short? 3 syllables, ok.
- Easy to type? Somewhat, tho not great.
- Verdict: Suspect still too awkward for everyday use but not 100% convinced of that.
- "followups": folksy
- Thematically close to: not super close to any but "dependencies" seems to work fine?
- "A's followups", "B follows[ up] A",
@task(followups=[notify])
- Short? 3 syllables
- Easy to type? Yes!
- Verdict: just okay.
- "postrequisites": also clinical
- Thematically close to: "prerequisites" is a perfect match obviously
- "A's postrequisites", "B post-requires A",
@task(postrequisites=[notify])
- Short? Nah, 4 syllables
- Easy to type? Nope.
- Verdict: meh.
- "after-tasks": vaguely...Carroll-esque? idk
- Thematically close to: nothing is super close; pre-tasks, dependencies. Unfortunately neither of these "match" well as a logical inverse of "afterwards". The main English opposite would be "beforehand"? which seems too awkward.
- "A's after-tasks", "B comes after A",
@task(afterwards=[notify])
- Short? 3 syllables
- Easy to type? Yes.
- Verdict: Not awful but still a bit silly.
Implementation
- Naive version is to have a 'checker' aspect that confirms, each time, whether the desired result already exists. Check for existence of file, of config value, of cloud resource, etc.
- Could include a bunch of 'standard' versions of these, with the generalized case just being "any callable"
- Or some richer value, maybe, perhaps
- Could include a bunch of 'standard' versions of these, with the generalized case just being "any callable"
- As DAG for a given task exec grows and multiple sub(-sub-sub-sub)-tasks all depend on the same low level bits, that doesn't scale well because you're still e.g. hitting disk, remote system, API call, etc, dozens of times unnecessarily, even if the check action tends to be much faster than the "make the thing" action.
- So we could formalize that "if the task has been called already this interpreter session, assume what it does is satisfied".
Task
already has basic (if not well trod) call count tracking so this could be easy to do.
Related, possibly subsumed tickets:
- Task dependencies #41 - the original dependencies ticket, long closed
- Complex dependency deduping #45 - this is the closest match, and arguably a duplicate, though I interpret that ticket to be more "make the existing pre/post deduplicating use a DAG", whereas this ticket is about extending pre/post itself to have a richer API, even if that probably requires the DAG anyways.
- bypass task execution based on hash/timestamp #100 - if we grow the ability to say "this task should generate ", it makes a ton of sense to slightly extend that to "this task should generate , regenerating if exists but is old"
- Easy execution of tasks from within Python itself #170 - insofar as calling tasks from other tasks definitely wants to honor this new functionality
- Executing a single task multiple times w/ different params (aka 'parameterization') #228 - dependency execution needs to mesh well with parameterization, their intersection is historically easy to screw up / paint oneself into a corner
- Explicit support for user-level sharing of data between tasks #261 - since this means doubling down on use of pre-tasks as a common thing, it means we really gotta figure out the signature mismatch problem between the main task and its pre-tasks (and their pre-tasks and ...)
- Deduplication does not care with post constraint #298 (and/or its PR Fix deduplication for post hooks #299) - at least one user is using post-tasks, and noticed the existing behavior is unexpected re: overall flow of pre, main, post tasks. Something that should probably be solvable during this ticket.