Structured data in macro arguments #3522

davidmorgan · 2023-12-18T09:40:11Z

Macro args have the limitation that they must be serializable, and as currently specified this forces them to be trivially mappable to JSON, i.e. primitive collections and values only. Or, they can be expressions, which are serialized as strings.

It seems like it would be an improvement if there can be a way to support structured data; by which I mean records, enums, custom classes, etc.

As a simplest possible example, people will want to define enums for type safe options: MyMacro(foo: FooSetting.veryEnabled, other: OtherSetting.mostlyDisabled}).

Without support for structured data my guess is that macro authors will (ab)use Code arguments by providing top level "builder" methods that make the macro applications look nice, then writing parsing and static checking for the "code" to extract the argument values. We know from existing codegen that authors will use any implementation trick whatsoever to provide a better / safer API for their users.

From a technical point of view I think supporting structured data is straightforward? The macro API would need to gain serialize/deserialize methods, so instead of relying on the macro being trivially serializable we simply ask it to serialize itself.

If macro authors are going to (ab)use Code to pass arguments then writing serialize and deserialize would be less work for them. Still a headache, but at least there is only one valid solution so it will lead to the same experience across all macros, if they manage to avoid bugs.

But the next step seems obvious, we could provide a macro that writes serialize and deserialize for you? :) ... everyone wins!

The text was updated successfully, but these errors were encountered:

rrousselGit · 2023-12-18T10:31:13Z

On that topic, is there a reason why we don't have a ConstantReader equivalent for non-primitive objects?
With build_runner, we can read any objects if we wanted to.

There are already cases like this in the wild.
For example json_serializable:

class DateTimeConverter extends JsonConverter {
  const DateTimeConverter();
  ...
}

@JsonSerializable(
  fieldRename: FieldRename.kebab, // Enums
  converters: [DateTimeConverter()], // Custom objects implementing a specific interface
)

Or Riverpod:

@riverpod // Will generate "const fooProvider =..."
int foo() {}

@Riverpod(
  dependencies: [fooProvider], // A list of object this function depends on
)
int bar(ref) {
  ref.watch(fooProvider);
}

jakemac53 · 2023-12-18T15:44:03Z

It seems like it would be an improvement if there can be a way to support structured data; by which I mean records, enums, custom classes, etc.

I definitely agree it would be desirable.

From a technical point of view I think supporting structured data is straightforward? The macro API would need to gain serialize/deserialize methods, so instead of relying on the macro being trivially serializable we simply ask it to serialize itself.

Unfortunately, not as simple as it would seem on the surface, but might be doable with some constraints. There are a couple issues:

Subtyping: If the user tries to pass a subtype of the type you accept, it might be serializable (if the user actually knew to make it so) but then in the macro isolate you have never heard of the subtype and can't properly deserialize it. Maybe we could block subtyping though, and force everything to go through a final class. Probably, I would actually force any of these accepted types to have a special macro annotated, which adds the serialization logic, as well as enforcing some constraints such as the class being final.
Not yet generated identifiers: Sometimes, we might not be able to construct an instance at all, because some identifiers used in the constructor don't yet exist. We can possibly just block that or something, but it may require losing parallelism across libraries (although I don't think we have much parallelism in practice today).
Shadowing: Even if we do find all referenced identifiers in scope, we cannot actually know they are the correct identifiers. This implies some form of checking to ensure they don't change etc.

I think we could possibly come up with workable answers to all of these things, but today we don't allow it mostly just to simplify the feature. Expanding it later on to allow this kind of data is non-breaking etc.

On that topic, is there a reason why we don't have a ConstantReader equivalent for non-primitive objects?

There is some WIP constant evaluation stuff, mostly this just hasn't been figured out yet exactly how we want it to work. It is challenging to specify (for a lot of the same reasons as above).

There is also a hope that in many cases, macros can just emit back into their own code the Code argument they receive, instead of needing to evaluate it at compile time. It is understood that this won't solve all use cases though.

rrousselGit · 2023-12-18T16:32:21Z

There is also a hope that in many cases, macros can just emit back into their own code the Code argument they receive, instead of needing to evaluate it at compile time. It is understood that this won't solve all use cases though.

I don't think any of the examples I listed above would work with that.
Riverpod definitely wouldn't, as it relies on reading what's inside fooProvider.

Afaik json_serializable's JsonConverter logic isn't a plain copy-paste either. From the top of my head, it supports generics converters:

https://github.com/google/json_serializable.dart/blob/f7cb51662b21d11b3fdb6c5e0fdd46431cec59d4/json_serializable/test/kitchen_sink/json_converters.dart#L7

class GenericConverter<T> implements JsonConverter<T, Map<String, dynamic>> {
  const GenericConverter();

  @override
  T fromJson(Map<String, dynamic> json) => null as T;

  @override
  Map<String, dynamic> toJson(T object) => {};
}

And invoke it with various generics based on the situation.

The enum's case is probably the simplest to handle. A String instead of an enum would work. But at the very least, we'd lose out in terms of autocompletion and discovery.

jakemac53 · 2023-12-18T16:39:38Z

For some of these examples, you can take an Identifier (or List<Identifier>). Those can both be emitted back into Code and introspected upon. I don't know all the details here but I think that covers some of these use cases?

rrousselGit · 2023-12-18T17:01:33Z

It depends on how introspectable "Code" is.

In Riverpod's case for instance, given:

const fooProvider = Provider(
   dependencies: [barProvider],
);

...

@Riverpod(depencencies: [fooProvider])
int example() {..}

This generates:

const exampleProvider = Provider(
  dependencies: [fooProvider, barProvider] // dependencies of "fooProvider" are also listed here
);

You can view it as a spread:

const exampleProvider = Provider(
  dependencies: [fooProvider, ...fooProvider.dependencies],
);

I don't want to do this at runtime, as the chain can be quite long. This would be inneficient. And it matters to be that those objects are consts too.

I could be wrong, but afaik a Code wouldn't enable this, right?

jakemac53 · 2023-12-18T17:09:36Z

I could be wrong, but afaik a Code wouldn't enable this, right?

Assuming general purpose constant evaluation is available, and that fooProvider is itself const, I do think this could be possible.

Basically you would const evaluate your Code arguments, check that they are Provider instances, read that dependencies field, and then emit those dependencies into your own code (this last bit would get a bit tricky, but would be similar to what you do today with the analyzer to convert DartObject values into code).

rrousselGit · 2023-12-18T17:09:55Z

In JsonSerializable's case, given @JsonSerializable(converters: [GenericConverter()])

The generator may generate GenericConverter<T>().fromJson(..) or GenericConverter<S>() ... based on custom logic

rrousselGit · 2023-12-18T17:12:12Z

Assuming general purpose constant evaluation is available, and that fooProvider is itself const, I do think this could be possible.

As long as constant evaluation is indeed possible, I have no worries then :)

davidmorgan · 2023-12-19T09:44:06Z

Thanks for the detailed reply Jake :) makes sense.

Subtyping: If the user tries to pass a subtype of the type you accept, it might be serializable (if the user actually knew to make it so) but then in the macro isolate you have never heard of the subtype and can't properly deserialize it. Maybe we could block subtyping though, and force everything to go through a final class. Probably, I would actually force any of these accepted types to have a special macro annotated, which adds the serialization logic, as well as enforcing some constraints such as the class being final.

Ah, I see what you mean.

If there is a macro Foo with an argument of type Bar, and the macro uses a method Bar#computeLabelForToString(), the user could pass in a CustomBar with override of computeLabelForToString, and expect their override code to run in the macro. Instead what will happen is CustomBar gets deserialized to Bar and the default computeLabelForToString will run.

Requiring final types would solve this, but at the cost of excluding a lot of types that will work fine but are not final, including types that the user doesn't own, which would be very frustrating.

I think without additional requirements it's already sufficiently under control of the macro author: if they are explicitly asking for an argument of type Bar, they know the declaration, if they can control it they can choose to make it final, if they choose not to make it final or if it's a type they don't control then the resulting (unlikely) problems are part of their API and they can explain them if needed.

Similarly if the macro has a List<Object> or List<SomeInterface> argument then the limitations of how this works are part of that specific macro API. The macro author can restrict the arguments both using the type system and via whatever checks they want during macro execution.

Not yet generated identifiers: Sometimes, we might not be able to construct an instance at all, because some identifiers used in the constructor don't yet exist. We can possibly just block that or something, but it may require losing parallelism across libraries (although I don't think we have much parallelism in practice today).

Shadowing: Even if we do find all referenced identifiers in scope, we cannot actually know they are the correct identifiers. This implies some form of checking to ensure they don't change etc.

I think we could possibly come up with workable answers to all of these things, but today we don't allow it mostly just to simplify the feature. Expanding it later on to allow this kind of data is non-breaking etc.

Yes, in some senses it's incremental; I guess most macros would want to use it and change their APIs, though, so it could feel more like 2.0 than a 1.1.

BTW, I'm not saying it's a good idea, but "macro that serializes macro args" is a pretty well defined use case for a first macro to publish / showcase.

jakemac53 · 2023-12-19T16:16:30Z

BTW, I'm not saying it's a good idea, but "macro that serializes macro args" is a pretty well defined use case for a first macro to publish / showcase.

I do think it would be a pretty well defined macro, but it also wouldn't be a very useful one, until users can actually write their own macros :)

davidmorgan · 2023-12-20T09:27:39Z

BTW, I'm not saying it's a good idea, but "macro that serializes macro args" is a pretty well defined use case for a first macro to publish / showcase.

I do think it would be a pretty well defined macro, but it also wouldn't be a very useful one, until users can actually write their own macros :)

Hmmm I think it would be useful for other use cases too. I guess the biggest difference to a general serialization macro is that the wire format is not published or stable; so no good for RPCs with non-Dart servers. But it would presumably be pretty good for passing data between isolates or for temporary caches on disk, for example.

jakemac53 · 2023-12-20T15:25:08Z

Another issue is I think we probably would want to use the Serializer/Deserializer interfaces which are internal to the macro implementation, and we really don't want to expose because then they would be hard to change.

davidmorgan · 2023-12-20T15:48:51Z

We might want that :) my guess is that we will have to expose enough of the serialization internals that it's possible to write a macro to assist with a package you don't own, and to accept data types from it as arguments.

Anyway, all speculation at this point :) thanks.

jakemac53 · 2023-12-20T18:00:02Z

Oh, another hiccup I had forgot about - on the compiler/analyzer side of things there is no actual instance of the class at all - just some abstract representation of it. So we can't rely on user code for the serialization.

Probably, what we would want is to use the existing Arguments class, and then macros would just need to be able to handle "deserializing" from that. And we would have to support a new form of argument for custom types.

davidmorgan · 2023-12-21T08:50:31Z

Oh ... right :)

Something like the DartObject interface in the analyzer, then, so the macro can traverse objects+fields and extract the data?

That should work fine.

The only problem I see is that it breaks the illusion that the macro is like a class; values get transformed on the way "in", so not:

macro class Foo {
  final Bar bar;
  const Foo(this.bar);
}

but somehow

macro class Foo {
  final Argument bar;
  const Foo(Bar this.bar);
}

--which we don't have a way to write. And when we do figure out a way to write it, there is a risk it will be horrible to unit test, because there is no general Bar->Argument transformation available for unit testing!

Quite a puzzle.

lrhn · 2023-12-24T10:44:25Z

We could consider how LINQ in C# works. It allows you to write expressions into LINQ queries, like from ... where x> y select x.foo + 42.
That can be consumed in two ways, either by compiling to normal code that's executed (it's an iterable .where/.map callback, basically), or by compiling into ASTs from for the expression, that the LINQ processor can then interpret itself.

Which one it is depends on the static type of the implicit receiver. So if the expression after from provides a normal iterable (IEnumerable), then the expressions are just parsed into callback functions (or evaluated inline in a normal iteration loop, if optimizable).
If the receiver is an IQueryable,
the LINQ expressions are provided to the query operations (like Where) as an Expression<T>, which is a typed AST for the expression.

So, could we make the behavior depend on the static type of the parameter to the macro annotation constructor:

class MyMacro implements Macro {
  final int depth;
  MyMacro(this.depth);
}
class MyFancyMacro implements Macro {
  final Argument<int> depth;
  MyFancyMacro(this.depth);
}

so that a macro annotation of @MyFancyMacro(context + 42) would provide the value const ArgumentExpression<int>(Op("+", Id("context"), Int(42))) of type Argument<int> as actual argument.

If Argument<T> is interpreted as T on the macro application side, and Expression<T> on the macro implementation side, then type checking should work. (Which probably won't work if a macro implementation also contains macro applications. And if the argument expression wants to contain references to abstract names provided only when interpreted in the implementation, then that's also a concern).
Also, the allowed C# expressions are somewhat simpler than general Dart expressions.

davidmorgan · 2024-05-29T16:25:25Z

Closing in favour of #3847

davidmorgan added the static-metaprogramming Issues related to static metaprogramming label Dec 18, 2023

github-project-automation bot added this to Static Metaprogramming - design/prototype Dec 18, 2023

github-project-automation bot moved this to Todo in Static Metaprogramming - design/prototype Dec 18, 2023

davidmorgan moved this from Todo to In Progress in Static Metaprogramming - design/prototype May 29, 2024

davidmorgan assigned davidmorgan and johnniwinther and unassigned davidmorgan and johnniwinther May 29, 2024

davidmorgan mentioned this issue May 29, 2024

Design for introspection on macro metadata / annotations #3847

Closed

davidmorgan closed this as completed May 29, 2024

github-project-automation bot moved this from In Progress to Done in Static Metaprogramming - design/prototype May 29, 2024

Structured data in macro arguments #3522

Structured data in macro arguments #3522

Comments

davidmorgan commented Dec 18, 2023

rrousselGit commented Dec 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakemac53 commented Dec 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rrousselGit commented Dec 18, 2023

Uh oh!

jakemac53 commented Dec 18, 2023

Uh oh!

rrousselGit commented Dec 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakemac53 commented Dec 18, 2023

Uh oh!

rrousselGit commented Dec 18, 2023

Uh oh!

rrousselGit commented Dec 18, 2023

Uh oh!

davidmorgan commented Dec 19, 2023

Uh oh!

jakemac53 commented Dec 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidmorgan commented Dec 20, 2023

Uh oh!

jakemac53 commented Dec 20, 2023

Uh oh!

davidmorgan commented Dec 20, 2023

Uh oh!

jakemac53 commented Dec 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidmorgan commented Dec 21, 2023

Uh oh!

lrhn commented Dec 24, 2023

Uh oh!

davidmorgan commented May 29, 2024

Uh oh!

rrousselGit commented Dec 18, 2023 •

edited

Loading

jakemac53 commented Dec 18, 2023 •

edited

Loading

rrousselGit commented Dec 18, 2023 •

edited

Loading

jakemac53 commented Dec 19, 2023 •

edited

Loading

jakemac53 commented Dec 20, 2023 •

edited

Loading