-
Notifications
You must be signed in to change notification settings - Fork 214
Design for introspection on macro metadata / annotations #3847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Also related: #3728 for the move to use annotations for macro metadata. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Hi everyone :) @jakemac53 @scheglov this is my writeup of @johnniwinther's exploratory work for this issue after discussion with him. It's intended to kick off a round of discussion on where to go next. Macro MetadataMacro users need to be able to tell macros what to do. This can take the form of values "passed" to the macro application, or of values in a normal annotation that one or more macros will inspect.
Because we plan to unify macro applications and annotations this reduces to the problem of introspecting on annotations; and because annotations must be const, this mostly means introspecting on consts. ConstraintsMacros are compiled independently of the program they are applied in, so a const can be of a type unknown to the macro.
Macros run before consts are evaluated; the program might be incomplete when they run, meaning evaluation is impossible; and even if evaluation is possible the value might change during macro execution.
Macros sometimes want to reproduce "values" passed to them in their generated output.
ProposalGiven the constraints, the solution appears to be to give macros access to an "enhanced" AST model for const expressions:
Implications and Next StepsThis adds significantly to the surface area of the macros API, with 40+ new data types in the exploratory code. A huge amount of complexity is added, with all the attendant costs. There is one clear upside: building a solution for const expression ASTs will mean there is likely to be a clear path to adding more access to ASTs to macros in a future release, for example to support introspecting method bodies. Current discussions around managing the complexity and maintainability of the macros API are focused around a query-based API and a schema for the macro API. It might make sense to combine prototyping around queries/schemas with prototyping around macro metadata, to explore whether they seem like a good fit. DetailThe exploratory PR contains lots of examples and some discussion; these can be discussed at length later, a selection are highlighted here to give some flavour:
|
Regarding re-binding issues in particular, I do think that treating declarations coming from augmentations (or "parent"s) as not being able to shadow things coming from imports would largely resolve the issue. It becomes an error to shadow something from an import via augmentation and also reference it, and this means we can accept the fact that macros might resolve identifiers differently based on when they run - because there will necessarily be an error in the final program on the annotation which they resolved incorrectly. See #3862 for context. |
As far as greatly expanding the API surface area to expose a typed model for the entire AST that can be represented in a constant expression, I would really like to avoid that. The ExpressionCode object is meant to be an equivalent abstraction but with a lot less surface area. We could specify more exactly how those should be constructed - maybe the original expression should be tokenized and then each token becomes a "part" of the code object, as an example. Then you can have client side only APIs to parse that into whatever structure, but these don't need to be serializable types, and the compilers don't have to know about them, which should simplify things and make the API more stable (these APIs can come from some other helper package). |
It's always tempting to push complexity to a shared client library, but then instead of a schema that you know you can evolve safely, you have a hybrid of schema and code that is very hard to safely evolve. You have to reason about what happens when there is version skew between the data and the code, and in practice you simply can't, you rely on test coverage; and then you don't have the tools you need to make progress. For example: Suppose the current schema is v3, and then we ship a language feature that adds new syntax: you can now write Dart code that cannot be described in v3, if you try to understand it as v3 then the meaning is lost. With the hybrid approach, what do you do? You are forced to use versioning even though there is no schema: you say that client versions <X can't understand the new Dart code, and ... then what? With a schema you can say, here is v4 that covers the new syntax. The macro can say it understands only up to v3, and the host can try to serve v3 and bail out in a reasonable way if syntax is used that needs v4. The macro author can update to support v4 at their convenience, and now the macro advertises that it supports v4 and can consume the new syntax. You could say, ah, we'll just use the language version: instead of v3 -> v4 it's Dart language 3.5 -> 3.6; macros say what language versions they can support, that's the "schema version". But then you make every minor language version a "breaking" change for macros, and you don't actually tell people if it's really breaking and if so what broke. Whereas when you publish a schema change everyone immediately knows if they have work to do: if in the context of one particular macro the missing surface area is irrelevant, will be seen as a bug, or should be supported as a feature. Maintaining a schema is a lot of work but it is all work that saves you from doing more work later. That's why they are so incredibly widely used even though they are painful to work with :) |
The analyzer already has exactly this problem, old versions of analyzer are generally selectable on new SDKs, but cannot parse that new code without a
For any change to the language which is significant enough to require a parser change, there are almost certainly going to be corresponding AST changes, which some (but not all) macros ultimately have to care about. You likely end up doing breaking changes in the package for these changes, since some macros will have to be updated to handle those changes (even just new AST nodes). And then every macro in the world has to be updated, regardless of if they are broken (to expand their constraint). If the actual AST classes are in a separate package, only the macros which actually depend on that package will have to update when it changes. They also still get an indication that they should update (they will see a new version available). Essentially this is a tradeoff of what kind of error you get - a static error because of code that can't be parsed by a macro with some old version of the shared package, versus a pub solve error after upgrading your SDK. The pub solve error blocks you even if you aren't actually broken at all. The static error shows you exactly in your code the expression that failed to parse. We could likely make this a well understood macro error with good tooling around it too (such as a lint/hint that knows the required version of the package in order to parse a given metadata annotation, and suggests that you upgrade to that version). Ultimately, I think I would prefer the static error in this case. It allows you to update your SDK constraint, and without any additional changes everything should continue to work. You are actually less likely to get blocked overall, because you can use macros that haven't yet been updated, as long as you don't use them on expressions using the new syntax. You could get broken on a pub upgrade, if you have macros which parse code, aren't updated to the latest version, and you use new syntax. But, in this case you would be broken either way, and in the pub solve case you can't just avoid using the new syntax. |
Thanks Jake! There are some tricky corners here, for sure. Fortunately I think covering the AST part with a schema does not restrict our options, there are a bunch of things we can do:
And since we will support versioning we can make these choices differently at different versions. Not sure what the right time to dig further is--probably we can get a lot more clarity on the choices once we have an end to end demo running. My guess at this point is that we should make Then we would, as you say, report missing support in a macro implementation only when it actually matters, as a compile error. Neatly splitting out the AST part, so we have e.g. |
I don't think it'll cover everything. We can still have const foo = 5;
class Foo {
@foo
bar() {}
} and with a macro generating
which would rebind |
I don't see how the ExpressionCode object is providing an equivalent abstraction. How does it for instance provide the ability to structurally inspect and/or evaluate the annotation? |
It does not directly - it just exposes a list of "parts" which are ultimately either strings or identifiers. Essentially a raw token stream. You can then build a client library on top of that, to "parse" it into a more structured AST, if desired. In CFE terms, we would basically do the "scanner" but not the "parser" - the parser part would be left up to the client. However even the scanning in this case would be less structured than what the actual CFE scanner produces - no specific Token types just strings. Another possible idea is to go all the way to a single string + scope. Any identifiers encountered when parsing would need to be constructed with that given scope, or maybe a sub-scope derived from that. This might actually work well with the data_model approach, where identifiers don't have IDs but instead a scope and a name? |
Re: ExpressionCode: Julia implements metaprogramming by representing expressions in a Lisp-like syntax. Maybe this idea can help? (See Greenspun's tenth rule) |
Jake and I chatted about this a bit; the more I think about it the more I think JSON is a natural fit. Subtrees of JSON can reference external schemas by version; each library in the model can be tagged with its element model version and AST model version. The macro can know if it has a recent enough schema, and decide whether to proceed using the corresponding helper Dart code--or simply manipulate the AST as a JSON tree. Or something like that :) we'll see. |
In any case I do think that whether we have a structural model on the wire versus a more raw token stream is a bit of a distraction from the more interesting questions. If the rest of the team is comfortable with the API (and protocol) surface area expansion I am fine with being overridden. I am more specifically interested in discussing the semantic differences compared to the existing proposal. Is this mostly about specifying the behavior for edge cases better (for example adding a an error on re-binding), or is there something fundamentally different which allows the CFE to work better with this model? Here is my high level of summary of answers to bullets in the "details" section above, in the current proposal:
The current proposal says "All identifiers in code must be defined outside of the current strongly connected component (that is, the strongly connected component which triggered the current macro expansion)." Essentially, it adds a restriction to sidestep the issue. We could instead make re-binding an error though. I don't have a strong opinion on this. The problem is more than just re-binding, because you could also change the actual value of a constant by augmenting its initializer. The restriction in the current proposal sidesteps both issues.
The existing proposal only gives you Identifiers (via the
I believe this would be a potential issue in either proposal.
I believe this would be a potential issue in either proposal. |
I think the restriction is a problem: 99.999% of the time the const will not actually be affected by the import that triggers the restriction, I think we'll want to proceed as if it's going to work and bail out only if it doesn't. Actually, you don't really get a choice: the macro annotation itself can't be const evaluated, for sure, so you do need to build an API that works on incomplete code. An AST-based API can naturally handle incomplete values, because you have a meaningful way to dig into the pieces that are there. With a value-based API you hit a wall when you encounter something that needs to be complete to evaluate, like a constructor call. (Including, usually, the macro annotation itself). The I do think there is a chance we end up wanting a
Yes, the important part is deciding what data we need, how to get it, and how it will be exposed to the macro; once we have that sorted out we can move things around as needed, including changing our mind between versions. Thanks :) |
I chatted to Jake about this yesterday, and attempted to make progress today by digging into the analyzr's My conclusion is that we probably need some worked examples to explore the details of what changes if we try to talk about values vs trying to work with the AST. I suspect an important part of the problem is cases where the analyzer/CFE cannot fully evaluate an expression and so the AST (or some approximate AST) is the best that can be provided. But it would be good to understand this with examples. I do not have a good feel for what the examples need to be, but I tried to create one to kick things off :) in a doc which I think might be an easier way to iterate on specific examples, how does that sound? |
Sorry to show up late to the party. I got way behind on GitHub issues when I was out on leave.
Me too. Martin Odersky gave a talk on Scala macros years ago, and one of the things he was emphatic about was that exposing the full AST API to users made it really hard to evolve the language after that. I really feel like we are overshooting the expressiveness actually needed here. It's easy to imagine use cases where a macro might want to be able to dig into unevaluated annotation arguments structurally, or evaluate them to values, or have annotation arguments that contain type literals for types that will be produced by macros, or all manner of other complicated scenarios. But as powerful as macros are already planned to be, we can't make them all-powerful (at least, not without a monumental amount of complexity) and at some point we have to draw the line and say "yeah, you can't do that, or you have to do it some other way". For introspecting on annotations, I would really love a complete list of concrete use cases that we consider to be requirements and then be willing to disallow use cases out of that set if it gives us simplicity in return. |
Coincidentally I shared (internally) last week a doc about evolving tl;dr: I think macros already poses significant risk to further evolution of the language without AST. So I think we anyway have to solve all the attendant problems--we have to be prepared for any level of breakage related to language changes, and to keep the lights on through it, with a good experience for macro users, macro authors, language designers, analyzer and CFE team, ... So I am not worried about adding AST for that reason. We can handle breaking changes: the worst case user experience is that some subset of macros needs updating to work with the new language version, and until then the user needs to hold back the language version of only the files where those macros are applied. That doesn't mean that I'm not worried ;) and I agree with starting simple. I think the big challenge here is that we are expecting to have two implementations, analyzer and CFE, of something that is outside defined language semantics. I think that's why AST naturally comes into the discussion, because it's something else well defined that analyzer and CFE should be able to stay in agreement about. Anyway, we are at a point where we can try this in code, so let's see where the code takes us :) |
Note that we have limited ability to "disallow" things, because a macro annotation has no restrictions and anyway a macro can inspect a non-macro annotation; we can only find problems by actually running the macros. Fortunately I think we can still offer a good experience: Cover a subset of expression AST, whatever we think is interesting, then represent anything outside that as "unknown". A macro encountering That would be a different type of failure to one due to a breaking change to the AST, which would be "I don't understand this language version yet, please pin to an earlier language version.". |
For annotation introspection, I'd prefer a somewhat powerful feature, recognizing at least all literals+collections+const constructor invocations as values, with type arguments. That should allow for reasonable typed tree structures. If we restrict it too much, people will just encode the same thing in string literals, with worse or no IDE support and longer macro execution times due to the (repeated per phase) parsing. It's constant expressions only, so I don't think a complete AST is that prohibitive. |
@davidmorgan wrote
It must be able to. If the reason for the limitation is that "the constant may change its value", there's a simple cure for that: if the constant has been used already for some calculations for a parameter requested by a macro, the value should be locked, and any attempt to change it would trigger an error. Even totally prohibiting the constants from changing their values would be better than the situation where the macro has no access to the parameters and must resort to relying on AST - and then what? implement an interpreter? Will this interpreter be better than the one used by the analyzer? |
From a language perspective, what happens before running macros is not Dart evaluation. It's some approximation that the tools do on an incomplete Dart program (which means it's not a Dart program). In any case, introspection should give you a source mirror. It may be able to say what identifiers point to, but There doesn't have to be any requirement that I'd go for that. Let you reflect on the partial program, as source. How the macro interprets that source is up to it. |
Indeed. There's a simpler (and more intuitive) way to pass the parameters in the above example, like It seems that in the current metaprogramming design, the usual rules for annotations don't apply. The annotation But there's a price to be paid. Suppose I want to generate a number of constants, so I write an annotation like BTW, if you want to preserve a more generous syntax like in your example, it may work, too (don't know to what end, but we are discussing the concept). For a parameter like in From your previous comment, your concern was that people may start passing stuff as strings, but what kind of stuff exactly? I can imagine I want to pass an SQL statement as a parameter - but I have to pass it as a string anyway, and there's nobody to validate the syntax before it reaches my macro.. There's an opposite concern: people may start encrypting stuff with a seemingly valid dart syntax assuming some arbitrary semantics (like in your example) |
It allows macros to be applied to Note that |
It is expected that something like this will exist, but you can't start with the evaluated thing. You want to start from the (unresolved) AST, because that is always available. And then a macro can attempt to resolve it, which might fail for a variety of reasons, or we might allow it to return the wrong result, or cause an error if one of those things is augmented later to have a different value. |
@jakemac53 : |
This is essentially what the API would be, yes. We will have to ensure that analyzer/CFE agree on what can and cannot be evaluated, but certainly there are many simple cases that will just work. Macro authors don't have to internalize anything other than "this API may fail if the expression cannot be evaluated for some reason", and the error should describe the reason. |
I'd like to use the same inspection on the macro application annotation that is used to inspect on other annotations. That means we need to be able to reflect the source of all annotations, even those that refer to code that will be created by macros, before that code has been created. It's probably going to be rare that the macro application annotation itself references code that doesn't exist yet, but the capability needs to be there. There should be a |
Related: #3522
@johnniwinther @jakemac53 is there an issue already open for the investigation Johnni's been doing into macro metadata?
I couldn't find one, so here's a fresh one :) and I'll close #3522 in favour of this one since the discussion there does not seem super helpful.
The text was updated successfully, but these errors were encountered: