Skip to content

[hook] hook/generate.dart 🪝 #56512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dcharkes opened this issue Aug 19, 2024 · 17 comments
Open

[hook] hook/generate.dart 🪝 #56512

dcharkes opened this issue Aug 19, 2024 · 17 comments
Labels
area-dart-cli Use area-dart-cli for issues related to the 'dart' command like tool. area-sdk Use area-sdk for general purpose SDK issues (packaging, distribution, …).

Comments

@dcharkes
Copy link
Contributor

dcharkes commented Aug 19, 2024

We'd like a standardized way to generate code in Dart packages that does not require a user step.

  • Macro's can only take Dart code as input and output.
  • package:build_runner requires users to do a manual extra steps

In some hook/generate.dart this would be a package:build_runner equivalent, but it would be aware of the Dart/Flutter SDK and have no race conditions between saving files and running dart and flutter commands (see below).

In #54334, we've discussed multiple aspects w.r.t. code generators. Today we discussed some more requirements offline:

  • Automatically running code generators before pub publish to ensure no files are outdated.
  • Automatically running code generators after editing dependencies. (Or having the IDE prompt to rerun them.)
  • Automatically running code generators after pub get, before dart analyze. (Only needed if the generated code is not checked in.)

Assumptions:

  • The generated files are checked in. (Maybe we should aim for not checking generated files in? E.g. it would work more like macros and you'd be able to resolve to the generated files in the IDE, but they are not checked in.)
  • Code generators can be heavy to run (definitely longer-running than macros).

Some open questions:

  • Should it be a different hook than hook/build.dart? Or a mode for the build hook?
    • If it's the build hook, when do we run this hook in which modes.
    • If it's not the build hook, how do we ensure this generate hook is finished before running the build hook? (E.g. generate is run on-save of its dependencies, like build_runner. But then if you hot-restart in Flutter, you want that on-save action to be fully done first.)
    • This should be a different hook from hook/build.dart, it's run in a different phase in the developer workflow. Build hooks are run on flutter build/dart build when you want to make an application bundle for a specific target (target OS, target architecture, target OS API levels, specific flavor, etc.) The generate hook should be run after dependencies change (and before running dart analyze, to prevent dart analysis errors showing up when both the source and target is Dart.

Use cases:

  • FFIgen, JNIgen, Swiftgen interop
  • translation messages (JSON -> Dart API)
  • All build_runner setups

Background knowledge:

Thanks @mosuem @HosseinYousefi @mkustermann @liamappelbe for the discussion! Please elaborate on things I left out.

@dart-github-bot
Copy link
Collaborator

Summary: This issue proposes a standardized way to generate code in Dart packages without requiring user intervention. It aims to address limitations of existing solutions like macros and build_runner, which either lack flexibility or require manual steps. The goal is to automatically run code generators before publishing, after dependency changes, and before analysis, ensuring generated code is up-to-date and consistent.

@dart-github-bot dart-github-bot added area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). triage-automation See https://github.com/dart-lang/ecosystem/tree/main/pkgs/sdk_triage_bot. type-enhancement A request for a change that isn't a bug labels Aug 19, 2024
@dcharkes dcharkes changed the title [hook] hook/generate.dart [hook] hook/generate.dart 🪝 Aug 19, 2024
@mosuem
Copy link
Member

mosuem commented Aug 19, 2024

Another option would be to include build_runner in the SDK somehow, solving the issue with package:build_runner requires users to do a manual extra steps. This might run against the macros efforts though.

@lrhn lrhn added area-sdk Use area-sdk for general purpose SDK issues (packaging, distribution, …). area-dart-cli Use area-dart-cli for issues related to the 'dart' command like tool. and removed area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). type-enhancement A request for a change that isn't a bug triage-automation See https://github.com/dart-lang/ecosystem/tree/main/pkgs/sdk_triage_bot. labels Aug 19, 2024
@lrhn
Copy link
Member

lrhn commented Aug 19, 2024

TL;DR: Either run automatically or check in the code, not both.

Macro's can only take Dart code as input and output.

That's an assumption for the input. We don't currently let macros access the file system, but we could.
If we did, then you could pass the URI of a file as an argument in the macro application, have the macro load the file (in a way that can be cached in the macro server between phases) and then it can generate code from that data too.

The alternative is to take the entire file, convert it to a Dart constant value, and have that in the macro annotation, which is just a two-step code generation where you generate the macro application code. (I'd totally do that.)

Assumptions:

  • The generated files are checked in.

That's a very strong warning signal for me. Never check in auto-generated code.

It's OK to have a generator that you run manually to generate some code, and then you check in the generated code. It's a code updater then, not a code generator.
When people check out the code, the code they see is the code they get, nobody will overwrite it with something else. It's stable.

It's also OK to have an automatically run generator, which generates files on demand, before they are needed.
Those files should not be checked in. If the code that is checked in can be changed without you asking for it, then you can't trust the repository code anyway. It's better to have no code then.

If an automatically run generator is for Dart code, macros should be perfect for that. That's exactly the job macros are designed for, as long as they can get the data needed to generate the code. It's automatically generated code, guaranteed up-to-date, that you can still inspect, because every tool that processes Dart programs are aware of macros. It avoids any need ever storing the code, and checking it in.

Now, for non-Dart output, macros are obviously not the answer. There will need to be some code to generate such files. And a standard for where to place them. (Should the be in lib/ if another package depends on them?)

If files are checked in, I'd be more comfortable with a non-automatically run step, where you have to run dart generate to create/update the generated files. There is no need for someone just depending on a Pub packge to run dart generate for that package, if all generated files are checked in anyway.

Code generators can be heavy to run (definitely longer-running than macros).

That doesn't compute for me. Macros are code generators. You may have some heavy use-cases in mind, but whatever framework we create here should be able to handle both small and big tasks. And so should macros.

@dcharkes
Copy link
Contributor Author

It's OK to have a generator that you run manually to generate some code, and then you check in the generated code. It's a code updater then, not a code generator.

Hm, I don't think we ever call these things code updaters, maybe we should. Then the hook should be called hook/update_code.dart? (My 2 cents, "generating" is a more understandable verb than "updating" in this context. Even if files are checked in such as with FFIgen/JNIgen/build_runner.)

If we'd like to hook build_runner into this, then let's look at build runners terminology:

The build_runner package provides a concrete way of generating files using Dart code.

https://pub.dev/packages/build_runner

cc @davidmorgan

@davidmorgan
Copy link
Contributor

Thanks Daco! Looking forward to getting into the discussion here.

Re: checking in generated code, I don't think that's quite the right distinction to make. What's important is whether, after generation, the generator retains ownership of the output: has an opinion about whether it's correct.

I've been calling generators that do not retain ownership of their output "offline generators". A web page can be an offline generator, you copy its output into your source tree and you're done.

An online generator is the type of generator you want to run automatically.

Whether you check in the output of online generators or not is mostly a secondary issue related to convenience of other tools, for example you might do it so the output and diffs are visible on GitHub. And for pub, you publish the generated output today because otherwise your package doesn't work :)

Re: code generators being heavy to run--I'm working on this one, they should usually be about the same cost as lints.

@dcharkes
Copy link
Contributor Author

An online generator is the type of generator you want to run automatically.

FFIgen and JNIgen are online generators in this distinction.

Re: code generators being heavy to run--I'm working on this one, they should usually be about the same cost as lints.

I'm not sure that FFIgen and JNIgen will be able to be that fast. So I think we'll always have faster and slower generators in our ecosystem.

That being said, I'd love for FFIgen and JNIgen to be so fast that we can run them on-save on the source files. That way users can edit native code and live see the Dart bindings being updated. 😄 cc @HosseinYousefi @liamappelbe

@davidmorgan
Copy link
Contributor

It's always necessary to support slow generators, but we can try to make ours fast ;)

@HosseinYousefi
Copy link
Member

I haven't benchmarked it but JNIgen is quite fast on my machine. Especially if we're not recompiling code.

Both tools are not going be called frequently if we're just generating ready libraries because they don't change. One case where we will be updating is pigeon.

@tarrinneal is working on making pigeon work with FFIgen and JNIgen. We're generating the interfaces for Kotlin and Swift from a Dart IDL using build_runner. We could skip a parsing step for FFIgen and JNIgen since we already know what we'll be generating, so the tools will not have to parse (using clang for FFIgen and ASM for JNIgen) anymore which would improve their speed by a lot as the codegen part is not the bottleneck.

@natebosch
Copy link
Member

"generating" is a more understandable verb than "updating" in this context

+1

My expectation is that a "generator" would produce output based on some external input, while an "updater" would makes changes to a source file based on the content or changes made to that file. AFAIK we don't have any implementation which reads a file and updates it, only ones which overwrite the entire file.

@dcharkes
Copy link
Contributor Author

dcharkes commented Feb 20, 2025

Some notes from a discussion with @mosuem:

  • The generate hook should have a watch mode that can integrate with the IDE/SDK. In this mode, the process should keep running, and the output should be probably a socket that can be talked to.
    • What exactly the communication is with sockets/ports should be experimented with. E.g. Does the IDE/SDK send file uri's that were changed, or file contents as well? Does the hook save generated files to disk and send back the file uri's, or does it send back the new contents of files (and the SDK then saves them to disk)?
  • batch mode would be as already described in the previous posts.
    • The way that the hook/generate.dart should be written is with an abstraction that works both for batch and watch mode.
      • build_runner, FfiGen, and JniGen should provide a constructor that works for that.
  • Code generators need to be able to communicate that they don't support watch mode. (Hopefully we can make it easy for custom code generator to support watch mode.)
    • The hook/generate.dart needs to be able to communicate that watch mode is only supported for a subset of code generators (there might be more than one code generator).
  • Code generators need to be able to communicate that they might not be able to be rerun on every host.
    • For example, generating bindings against MacOS system APIs cannot be rerun on a Linux or Windows machine. But it should be possible to work on a package locally which has MacOS system API bindings and Linux system API bindings.
// Sketch for a hook/generate.dart

import 'package:hook/generate.dart';
import 'package:ffigen/ffigen.dart';
import 'package:build_runner/build_runner.dart';

void main(List<String> arguments) async {
  // This function doesn't return in `watch` mode, but does in batch mode.
  await generate(arguments, (input, output) async {
    final packageName = input.packageName;
    final ffigen1 = FfiGen(
      // ...
    );
    final ffigen2 = FfiGen(
      // ...
    );
    final buildRunner = BuildRunner(
      // ...
    );
    final customGenerator = MyGenerator(
      // ... might not support watch mode
    );
    final generators = [ffigen1, ffigen2, buildRunner, customGenerator];
    for (final generator in generators) {
      // input contains whether it's `batch` or `watch` mode.
      await generator.run(
        input: input,
        output: output,
        logger:
            Logger('')
              ..level = Level.ALL
              ..onRecord.listen((record) {
                print(record.message);
              }),
      );
    }
  });
}

@davidmorgan
Copy link
Contributor

davidmorgan commented Feb 20, 2025

I suspect that delegating "watch" responsibility to each generator is going to get complicated: it means there is no overall control, each generator will independently pick up changes and produce outputs.

For cases where generators are independent this is just awkward, for cases where one generator output feeds into another's input it starts to break :)

It might actually be simpler to turn ffigen and intl into badly behaved build_runner generators that do extra work on the side, in order to get build_runner to manage running them correctly interleaved with build_runner generators.

@mosuem
Copy link
Member

mosuem commented Feb 20, 2025

I suspect that delegating "watch" responsibility to each generator is going to get complicated: it means there is no overall control, each generator will independently pick up changes and produce outputs.

I think the idea is to a single watcher, which notifiers each generator on changes.

For cases where generators are independent this is just awkward, for cases where one generator output feeds into another's input it starts to break :)

I don't see the awkward part - and I would just disallow feeding inputs into other generators for starters. Long term, we might think about having a similar mechanism as for build hooks, where a DAG is constructed and invoked in order.

@davidmorgan
Copy link
Contributor

I suspect that delegating "watch" responsibility to each generator is going to get complicated: it means there is no overall control, each generator will independently pick up changes and produce outputs.

I think the idea is to a single watcher, which notifiers each generator on changes.

Ah okay--possibly I was reading too much into the pseudocode :)

For cases where generators are independent this is just awkward, for cases where one generator output feeds into another's input it starts to break :)

I don't see the awkward part - and I would just disallow feeding inputs into other generators for starters. Long term, we might think about having a similar mechanism as for build hooks, where a DAG is constructed and invoked in order.

build_runner generators need to read config and need to know what files are on disk in order to figure out their inputs+outputs; so for example a generator might match lib/foo/*bar.dart then generate lib/foo/*bar.g.dart. Then another generator might match on that output and produce further output.

build_runner generators that use the analyzer need to track transitive imports and count those as inputs; and this can include files that don't exist at the start of the build, and generated files that have imports.

So ... yeah, complicated ;)

It's fine to start simple, of course.

@dcharkes
Copy link
Contributor Author

for cases where one generator output feeds into another's input

That's an interesting situation indeed. And if we have multiple generators inside a single hook/generate.dart we would need to order them inside there. And we would also need to have the watch somehow forward changes or have the SDK fire a new set of file changes against the same process/socket. (We cannot rely on build_runner doing the ordering for us.)

Maybe we should have different sockets for different generators if there's more than one generator specified in hook/generate.dart. We have a large design space here. This was just a sketch, maybe it won't work with this sketch. We probably need a prototype to know if this would work.

and I would just disallow feeding inputs into other generators for starters

I don't think that's a good idea, we'd want composability.

It might actually be simpler to turn ffigen and intl into badly behaved build_runner generators that do extra work on the side, in order to get build_runner to manage running them correctly interleaved with build_runner generators.

I think build_runner is too restrictive.

build_runner relies on determining a static build graph before starting a build

https://github.com/dart-lang/build/blob/master/docs/faq.md#why-do-builders-need-unique-outputs

In general, code generators can traverse the file system and come up with dependencies at runtime. That's why you produce a deps file as the output, instead of knowing all the inputs up front.

And in general, code generators can produce multiple output files, and come up with these at runtime. (Packaging outputs in a zip file to have a single output seems like a weird workaround. And we would also have to package the whole file system in a zip to have a single predefined input conceptually. Of course that would not work with caching, that zip would change all the time.)

My hope would be that hook/generate.dart would put less restrictions on generators. We should not be designing a system that cannot work with already existing use cases. So we need to support dynamically discovering dependencies and dynamically defining outputs. The fact that build_runner restricts more, and is consequently able to provide more (ordering different builders), is nice, but not applicable to the whole eco system of Dart code generators.

@davidmorgan
Copy link
Contributor

This is an extremely well explored space; what build_runner does is based on what bazel does, and leads to important results about scalability and correctness.

If a generator has full freedom to define its own outputs at runtime, then in order to determine the build graph you have to repeatedly run all the generators until the graph reaches a stable state. You end up with a build system that is extremely fragile and cannot be parallelized.

What bazel does is to require that the build graph can be computed before a generator fully runs and without any unbounded computation; and build_runner does something similar. This makes the build graph computation guaranteed to run quickly and without churn. Then you can get it over with and get on to the actual build :) knowing enough to parallelize.

I 100% agree with addressing the use cases of generators that want more flexible output. Archives seems to me--at first glance--to be the right solution. There is no reason for archives to break caching: quite the reverse, an archive can provide digests for itself and for the files it contains, making caching simpler. There is no reason to use archives for input, individual file inputs are fine, any input can be generated as long as it comes from a generator that can cheaply compute its outputs.

I'm sure we can come up with a simple+correct hack that works before archives are available / before we can build the needed support in build_runner. For example just running the noncompliant generators in the generate hook after build_runner in a fixed order, and forbidding any dependency in the other direction, is probably sufficient; we just hide the non-build_runner output completely from build_runner.

Thanks.

@dcharkes
Copy link
Contributor Author

dcharkes commented Feb 20, 2025

Some musings with @davidmorgan:

  • Code generators are inherently ordered at package boundaries by the package import graph.
    • build_runner generators can read from transitive deps, and can only output in the current package.
    • hooks/build.dart have the same restriction. Only read from your own package and (direct) deps, and only output in the current package output directory.
    • So multiple hook/generator.darts in a workspace would be ordered by the package graph.
  • Within a hook/generate.dart it would be nice if we could more fine-grained ordering/parallelization of generators.
    • This requires knowing the inputs and outputs up front.
      • The way that bazel forces this is by basically doing globs over the inputs in the "planning phase". This is an overapproximation, but suffices. (For Dart compilation its all the Dart files in a package and all the other build rules which could be dart packages with their Dart files again.)
        • We can probably do something similar for FFIgen and JNIgen doing globs over the include directories in a "planning-phase" and by such means wrap FFIgen and JNIgen inside build runner.
    • To use build_runner to order multiple generators, and to have generators such as FFIgen, JNIgen, intl, win32, we most likely need to extend build_runners capabilities slightly.
      • For example: Inputs outside a Dart package (system headers)
      • We will explore trying to wrap FFIgen, JNIgen, win32 in build_runner to see where capabilities need to be extended.

@natebosch
Copy link
Member

  • So multiple hook/generator.darts in a workspace would be ordered by the package graph.

We can not produce a fully ordered package graph because we support dependency cycles at the package level. build_runner (modeled on bazel) allows breaking a single package down into "targets" to have fully ordered build graphs.

  • For example: Inputs outside a Dart package (system headers)

dart-lang/build#967 has some discussion of how we might have built this support. I do think that it could work well to add a hook generators can implement to provide a Digest for the external resources they care about. I think this only works if the generators can efficiently determine what external resources they'll need without reading the build system provided inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-dart-cli Use area-dart-cli for issues related to the 'dart' command like tool. area-sdk Use area-sdk for general purpose SDK issues (packaging, distribution, …).
Projects
None yet
Development

No branches or pull requests

7 participants