Skip to content

Structured data in macro arguments #3522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
davidmorgan opened this issue Dec 18, 2023 · 17 comments
Closed

Structured data in macro arguments #3522

davidmorgan opened this issue Dec 18, 2023 · 17 comments
Labels
static-metaprogramming Issues related to static metaprogramming

Comments

@davidmorgan
Copy link
Contributor

Re: the spec and this comment.

Macro args have the limitation that they must be serializable, and as currently specified this forces them to be trivially mappable to JSON, i.e. primitive collections and values only. Or, they can be expressions, which are serialized as strings.

It seems like it would be an improvement if there can be a way to support structured data; by which I mean records, enums, custom classes, etc.

As a simplest possible example, people will want to define enums for type safe options: MyMacro(foo: FooSetting.veryEnabled, other: OtherSetting.mostlyDisabled}).

Without support for structured data my guess is that macro authors will (ab)use Code arguments by providing top level "builder" methods that make the macro applications look nice, then writing parsing and static checking for the "code" to extract the argument values. We know from existing codegen that authors will use any implementation trick whatsoever to provide a better / safer API for their users.

From a technical point of view I think supporting structured data is straightforward? The macro API would need to gain serialize/deserialize methods, so instead of relying on the macro being trivially serializable we simply ask it to serialize itself.

If macro authors are going to (ab)use Code to pass arguments then writing serialize and deserialize would be less work for them. Still a headache, but at least there is only one valid solution so it will lead to the same experience across all macros, if they manage to avoid bugs.

But the next step seems obvious, we could provide a macro that writes serialize and deserialize for you? :) ... everyone wins!

@rrousselGit
Copy link

rrousselGit commented Dec 18, 2023

On that topic, is there a reason why we don't have a ConstantReader equivalent for non-primitive objects?
With build_runner, we can read any objects if we wanted to.

There are already cases like this in the wild.
For example json_serializable:

class DateTimeConverter extends JsonConverter {
  const DateTimeConverter();
  ...
}

@JsonSerializable(
  fieldRename: FieldRename.kebab, // Enums
  converters: [DateTimeConverter()], // Custom objects implementing a specific interface
)

Or Riverpod:

@riverpod // Will generate "const fooProvider =..."
int foo() {}

@Riverpod(
  dependencies: [fooProvider], // A list of object this function depends on
)
int bar(ref) {
  ref.watch(fooProvider);
}

@jakemac53
Copy link
Contributor

jakemac53 commented Dec 18, 2023

It seems like it would be an improvement if there can be a way to support structured data; by which I mean records, enums, custom classes, etc.

I definitely agree it would be desirable.

From a technical point of view I think supporting structured data is straightforward? The macro API would need to gain serialize/deserialize methods, so instead of relying on the macro being trivially serializable we simply ask it to serialize itself.

Unfortunately, not as simple as it would seem on the surface, but might be doable with some constraints. There are a couple issues:

  • Subtyping: If the user tries to pass a subtype of the type you accept, it might be serializable (if the user actually knew to make it so) but then in the macro isolate you have never heard of the subtype and can't properly deserialize it. Maybe we could block subtyping though, and force everything to go through a final class. Probably, I would actually force any of these accepted types to have a special macro annotated, which adds the serialization logic, as well as enforcing some constraints such as the class being final.
  • Not yet generated identifiers: Sometimes, we might not be able to construct an instance at all, because some identifiers used in the constructor don't yet exist. We can possibly just block that or something, but it may require losing parallelism across libraries (although I don't think we have much parallelism in practice today).
  • Shadowing: Even if we do find all referenced identifiers in scope, we cannot actually know they are the correct identifiers. This implies some form of checking to ensure they don't change etc.

I think we could possibly come up with workable answers to all of these things, but today we don't allow it mostly just to simplify the feature. Expanding it later on to allow this kind of data is non-breaking etc.

On that topic, is there a reason why we don't have a ConstantReader equivalent for non-primitive objects?

There is some WIP constant evaluation stuff, mostly this just hasn't been figured out yet exactly how we want it to work. It is challenging to specify (for a lot of the same reasons as above).

There is also a hope that in many cases, macros can just emit back into their own code the Code argument they receive, instead of needing to evaluate it at compile time. It is understood that this won't solve all use cases though.

@rrousselGit
Copy link

There is also a hope that in many cases, macros can just emit back into their own code the Code argument they receive, instead of needing to evaluate it at compile time. It is understood that this won't solve all use cases though.

I don't think any of the examples I listed above would work with that.
Riverpod definitely wouldn't, as it relies on reading what's inside fooProvider.

Afaik json_serializable's JsonConverter logic isn't a plain copy-paste either. From the top of my head, it supports generics converters:

https://github.com/google/json_serializable.dart/blob/f7cb51662b21d11b3fdb6c5e0fdd46431cec59d4/json_serializable/test/kitchen_sink/json_converters.dart#L7

class GenericConverter<T> implements JsonConverter<T, Map<String, dynamic>> {
  const GenericConverter();

  @override
  T fromJson(Map<String, dynamic> json) => null as T;

  @override
  Map<String, dynamic> toJson(T object) => {};
}

And invoke it with various generics based on the situation.

The enum's case is probably the simplest to handle. A String instead of an enum would work. But at the very least, we'd lose out in terms of autocompletion and discovery.

@jakemac53
Copy link
Contributor

For some of these examples, you can take an Identifier (or List<Identifier>). Those can both be emitted back into Code and introspected upon. I don't know all the details here but I think that covers some of these use cases?

@rrousselGit
Copy link

rrousselGit commented Dec 18, 2023

It depends on how introspectable "Code" is.

In Riverpod's case for instance, given:

const fooProvider = Provider(
   dependencies: [barProvider],
);

...

@Riverpod(depencencies: [fooProvider])
int example() {..}

This generates:

const exampleProvider = Provider(
  dependencies: [fooProvider, barProvider] // dependencies of "fooProvider" are also listed here
);

You can view it as a spread:

const exampleProvider = Provider(
  dependencies: [fooProvider, ...fooProvider.dependencies],
);

I don't want to do this at runtime, as the chain can be quite long. This would be inneficient. And it matters to be that those objects are consts too.

I could be wrong, but afaik a Code wouldn't enable this, right?

@jakemac53
Copy link
Contributor

I could be wrong, but afaik a Code wouldn't enable this, right?

Assuming general purpose constant evaluation is available, and that fooProvider is itself const, I do think this could be possible.

Basically you would const evaluate your Code arguments, check that they are Provider instances, read that dependencies field, and then emit those dependencies into your own code (this last bit would get a bit tricky, but would be similar to what you do today with the analyzer to convert DartObject values into code).

@rrousselGit
Copy link

In JsonSerializable's case, given @JsonSerializable(converters: [GenericConverter()])

The generator may generate GenericConverter<T>().fromJson(..) or GenericConverter<S>() ... based on custom logic

@rrousselGit
Copy link

Assuming general purpose constant evaluation is available, and that fooProvider is itself const, I do think this could be possible.

As long as constant evaluation is indeed possible, I have no worries then :)

@davidmorgan
Copy link
Contributor Author

Thanks for the detailed reply Jake :) makes sense.

  • Subtyping: If the user tries to pass a subtype of the type you accept, it might be serializable (if the user actually knew to make it so) but then in the macro isolate you have never heard of the subtype and can't properly deserialize it. Maybe we could block subtyping though, and force everything to go through a final class. Probably, I would actually force any of these accepted types to have a special macro annotated, which adds the serialization logic, as well as enforcing some constraints such as the class being final.

Ah, I see what you mean.

If there is a macro Foo with an argument of type Bar, and the macro uses a method Bar#computeLabelForToString(), the user could pass in a CustomBar with override of computeLabelForToString, and expect their override code to run in the macro. Instead what will happen is CustomBar gets deserialized to Bar and the default computeLabelForToString will run.

Requiring final types would solve this, but at the cost of excluding a lot of types that will work fine but are not final, including types that the user doesn't own, which would be very frustrating.

I think without additional requirements it's already sufficiently under control of the macro author: if they are explicitly asking for an argument of type Bar, they know the declaration, if they can control it they can choose to make it final, if they choose not to make it final or if it's a type they don't control then the resulting (unlikely) problems are part of their API and they can explain them if needed.

Similarly if the macro has a List<Object> or List<SomeInterface> argument then the limitations of how this works are part of that specific macro API. The macro author can restrict the arguments both using the type system and via whatever checks they want during macro execution.

  • Not yet generated identifiers: Sometimes, we might not be able to construct an instance at all, because some identifiers used in the constructor don't yet exist. We can possibly just block that or something, but it may require losing parallelism across libraries (although I don't think we have much parallelism in practice today).
  • Shadowing: Even if we do find all referenced identifiers in scope, we cannot actually know they are the correct identifiers. This implies some form of checking to ensure they don't change etc.

I think we could possibly come up with workable answers to all of these things, but today we don't allow it mostly just to simplify the feature. Expanding it later on to allow this kind of data is non-breaking etc.

Yes, in some senses it's incremental; I guess most macros would want to use it and change their APIs, though, so it could feel more like 2.0 than a 1.1.

BTW, I'm not saying it's a good idea, but "macro that serializes macro args" is a pretty well defined use case for a first macro to publish / showcase.

@jakemac53
Copy link
Contributor

jakemac53 commented Dec 19, 2023

BTW, I'm not saying it's a good idea, but "macro that serializes macro args" is a pretty well defined use case for a first macro to publish / showcase.

I do think it would be a pretty well defined macro, but it also wouldn't be a very useful one, until users can actually write their own macros :)

@davidmorgan
Copy link
Contributor Author

BTW, I'm not saying it's a good idea, but "macro that serializes macro args" is a pretty well defined use case for a first macro to publish / showcase.

I do think it would be a pretty well defined macro, but it also wouldn't be a very useful one, until users can actually write their own macros :)

Hmmm I think it would be useful for other use cases too. I guess the biggest difference to a general serialization macro is that the wire format is not published or stable; so no good for RPCs with non-Dart servers. But it would presumably be pretty good for passing data between isolates or for temporary caches on disk, for example.

@jakemac53
Copy link
Contributor

Another issue is I think we probably would want to use the Serializer/Deserializer interfaces which are internal to the macro implementation, and we really don't want to expose because then they would be hard to change.

@davidmorgan
Copy link
Contributor Author

We might want that :) my guess is that we will have to expose enough of the serialization internals that it's possible to write a macro to assist with a package you don't own, and to accept data types from it as arguments.

Anyway, all speculation at this point :) thanks.

@jakemac53
Copy link
Contributor

jakemac53 commented Dec 20, 2023

Oh, another hiccup I had forgot about - on the compiler/analyzer side of things there is no actual instance of the class at all - just some abstract representation of it. So we can't rely on user code for the serialization.

Probably, what we would want is to use the existing Arguments class, and then macros would just need to be able to handle "deserializing" from that. And we would have to support a new form of argument for custom types.

@davidmorgan
Copy link
Contributor Author

Oh ... right :)

Something like the DartObject interface in the analyzer, then, so the macro can traverse objects+fields and extract the data?

That should work fine.

The only problem I see is that it breaks the illusion that the macro is like a class; values get transformed on the way "in", so not:

macro class Foo {
  final Bar bar;
  const Foo(this.bar);
}

but somehow

macro class Foo {
  final Argument bar;
  const Foo(Bar this.bar);
}

--which we don't have a way to write. And when we do figure out a way to write it, there is a risk it will be horrible to unit test, because there is no general Bar->Argument transformation available for unit testing!

Quite a puzzle.

@lrhn
Copy link
Member

lrhn commented Dec 24, 2023

We could consider how LINQ in C# works. It allows you to write expressions into LINQ queries, like from ... where x> y select x.foo + 42.
That can be consumed in two ways, either by compiling to normal code that's executed (it's an iterable .where/.map callback, basically), or by compiling into ASTs from for the expression, that the LINQ processor can then interpret itself.

Which one it is depends on the static type of the implicit receiver. So if the expression after from provides a normal iterable (IEnumerable), then the expressions are just parsed into callback functions (or evaluated inline in a normal iteration loop, if optimizable).
If the receiver is an IQueryable,
the LINQ expressions are provided to the query operations (like Where) as an Expression<T>, which is a typed AST for the expression.

So, could we make the behavior depend on the static type of the parameter to the macro annotation constructor:

class MyMacro implements Macro {
  final int depth;
  MyMacro(this.depth);
}
class MyFancyMacro implements Macro {
  final Argument<int> depth;
  MyFancyMacro(this.depth);
}

so that a macro annotation of @MyFancyMacro(context + 42) would provide the value const ArgumentExpression<int>(Op("+", Id("context"), Int(42))) of type Argument<int> as actual argument.

If Argument<T> is interpreted as T on the macro application side, and Expression<T> on the macro implementation side, then type checking should work. (Which probably won't work if a macro implementation also contains macro applications. And if the argument expression wants to contain references to abstract names provided only when interpreted in the implementation, then that's also a concern).
Also, the allowed C# expressions are somewhat simpler than general Dart expressions.

@davidmorgan
Copy link
Contributor Author

Closing in favour of #3847

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
static-metaprogramming Issues related to static metaprogramming
Projects
Development

No branches or pull requests

5 participants