Dynamic Dependencies #28346

reutermj · 2026-01-20T15:31:39Z

reutermj
Jan 20, 2026

Bazel 9 adds map_directory which allows for some analysis to be deferred to execution time. This turns out to be powerful enough to, albeit in a hacky manner, encode dynamic dependency resolution. Starting this discussion for a more robust API to handle such cases.

https://bazel.build/rules/lib/builtins/actions#map_directory

Compiling Lean: An Example

I came across this problem with working on a ruleset for Lean. Lean modules must be compiled in a topological order declared by the include statements in the files. You must first compile all modules included by a Lean module before compiling the module itself. Typically, the Lean build system, Lake, handles this by compiling in 2 phases:

Quickly parse the includes to generate a dependency graph, then
compile the Lean modules in a topological ordering.

This presents an issue in Bazel, typically the action graph is fixed at analysis time before the source files can be read.

`map_directory`

With the new introduction of map_directory, I was able to hack around the typical analysis time restrictions by instead encoding the dependency graph in file names, and then inside map_directory utilizing those encoded file names to produce the dynamic action graph with correct dependencies. See full explanation: https://github.com/reutermj/adventures_in_map_directory

Prior Art for API

Buck2 introduces dynamic_output for such cases: https://buck2.build/docs/rule_authors/dynamic_dependencies/

dynamic_output can read a select set of artifacts listed in the dynamic argument at analysis time. This provides two options for handling the Lean case:

An action runs a dependency resolver on the source files and writes the DAG to a json, then
dynamic_output takes the json as a dynamic argument, takes as inputs all the Lean source files, and produces the action graph with dependencies from the json

or

Pass all the source files to the dynamic argument and handle dependency resolution inside the dynamic_output in starlark, then
produce the action graph from the resolved data structure.

lberki · 2026-01-21T09:11:54Z

lberki
Jan 21, 2026
Maintainer

cc @brandjon @zhengwei143

I'm not against implementing some kind of functionality like this in Bazel. Bazel has two invariants that I know that must be upheld:

Actions cannot have inputs that are created outside of the target graph that bazel query 'deps(<target>)' returns. It looks like this is accomplished in Buck2 by giving dynamic_output() a list of "allowed" inputs and this sounds like an eminently workable plan for Bazel, too.
Actions cannot create outputs that may result in an action conflict, I.e. two separate actions trying to create the same output file. In Bazel, this is currently accomplished by how map_directory() works. If we wanted to generalize that, the simplest way would be to constrain the potential set of new output artifacts to within the original tree artifact because then action conflict checking is limited to within that tree artifact, as opposed to the whole build.

In addition, we have already have an API that reads files (repository_ctx.read()) and it would be nice if whatever we come up with was consistent with that.

Also, echoing the philosophy of Buck2, I'd much rather not encourage people to put expensive computation (like parsing complicated output formats) into Bazel.

0 replies

zhengwei143 · 2026-01-22T22:13:36Z

zhengwei143
Jan 22, 2026
Collaborator

I'd much rather not encourage people to put expensive computation (like parsing complicated output formats) into Bazel.

+1. I'm not against the ability to read files in general, but I'm worried that providing too flexible an API (e.g. FileApi.read_lines()) might encourage users to handle expensive computation such as parsing source files for imports in Starlark (that would better scale as a separate action). We could enforce that source files cannot be read (only outputs), but that could easily be circumvented with a simple cp action.

My view (happy to be proven wrong) is that the contents of a file are only as relevant to Blaze insofar as we can map them to concepts that Blaze understands (e.g. artifacts) and can further use in other APIs (e.g. template_ctx.run()). And so I'm thinking it might be better to only allow reading of "structured" data.

If the problem we're trying to address here is a dynamically generating a subgraph, then perhaps we only need an API that is sufficient to encode information regarding the nodes/edges to construct. We could enforce that Blaze only reads a list of filepaths from the file, that Blaze can map to actual artifacts that template_ctx is aware of: template_ctx.read_artifacts(file) -> list[FileApi] .

For example (and I'm handwaving a bunch here):

def _parse_imports_impl(template_ctx, input_directories, output_directories, tools, **kwargs):
  imports = {}
  imports_tree = output_directories["imports_tree"]
  for source_file in input_directories["source_tree"].children:
    outputs[source_file] = template_ctx.declare_file(source_file.parent_relative_path + ".out", directory = output_tree)
    imports[source_file] = template_ctx.declare_file(source_file.parent_relative_path + ".imports", directory = imports_tree)
    template_ctx.run(
      inputs = [source_file],
      outputs = [imports[source_file]],
      executable = tools["parse_imports"],
      arguments = [...],
    )


def _compile_impl(template_ctx, input_directories, output_directories, tools, **kwargs):
  outputs = {}
  output_tree = output_directories["output_tree"]
  imports_tree = input_directories["imports_tree"]
  sources = {src.parent_relative_path + ".imports": src for src in input_directories["source_tree"].children}
  for imports_file in imports_tree.children:
    source_file = sources[imports_file.parent_relative_path]
    compiled_output = outputs.setdefault(source_file, template_ctx.declare_file(source_file.parent_relative_path + ".out", directory = output_tree))
    deps = []
    for dep_file in template_ctx.read_artifacts(imports_file):
      deps.append(outputs.setdefault(dep_file, template_ctx.declare_file(dep_file.parent_relative_path + ".out", directory = output_tree)))
    inputs = depset([source_file] + deps, transitive = additional_inputs["misc_libs"])
    template_ctx.run(
      inputs = inputs,
      outputs = [compiled_output],
      executable = tools["compiler"],
      arguments = [...],
    )

   
def _rule_impl(ctx):
  source_tree = ctx.actions.declare_directory("source_tree")
  # For now I'm assuming sources are accessible from some declared directory
  ctx.actions.run(
    inputs = ctx.attr.srcs,
    outputs = source_tree,
    executable = ctx.executable.generate_source_tree,
    arguments = [...],
  )
  source_imports_tree = ctx.actions.declare_directory("source_imports_tree")
  ctx.actions.map_directory(
    implementation = _parse_imports_impl,
    input_directories = {"source_tree": source_tree},
    output_directories = {"imports_tree": source_imports_tree},
    tools = {"parse_imports": ctx.executable.parse_imports},
  )
  compiled_outputs_tree = ctx.actions.declare_directory("compiled_outputs_tree")
  ctx.actions.map_directory(
    implementation = _compile_impl,
    input_directories = {"source_tree": source_tree, "imports_tree": source_imports_tree},
    output_directories = {"outputs_tree": compiled_outputs_tree},
    additional_inputs = {"misc_libs": depset(ctx.attr.misc_libs)},
    tools = {"compiler": ctx.executable.compiler},
  )
  return [DefaultInfo(files = depset([compiled_outputs_tree]))]

0 replies

reutermj · 2026-01-25T18:12:01Z

reutermj
Jan 25, 2026
Author

Thanks for the comments @lberki and @zhengwei143

I took the morning to throw together a proof of concept demo of adding read_artifacts.

The demoed user experience: https://github.com/reutermj/bazel/tree/master/examples/map_directory_read_artifacts

My view (happy to be proven wrong) is that the contents of a file are only as relevant to Blaze insofar as we can map them to concepts that Blaze understands (e.g. artifacts) and can further use in other APIs (e.g. template_ctx.run()).

At least for cases like ML style languages and their build ordering constraints, this is certainly true enough. I'm not familiar enough with other things dynamic_output is intended to solve like LTO to comment on them.

The big open question I have is related to the manifest format. For the demo, I just used the full exec path of the input files. Is there another format that would be better suited for this?

Fair warning: I've never touched actual bazel internals and the implementation of read_artifacts is entirely AI generated. I mostly wanted to focus on getting a demo of using it out, and not proposing this implementation as the path forward.

0 replies

zhengwei143 · 2026-01-26T23:21:37Z

zhengwei143
Jan 26, 2026
Collaborator

For the demo, I just used the full exec path of the input files. Is there another format that would be better suited for this?

I glossed over path resolution previously, but yeah exec path seems reasonable.

cc @pzembrod @fmeum @rrbutani who might be interested -- would like to gather more data points first.

0 replies

aherrmann · 2026-02-03T10:19:08Z

aherrmann
Feb 3, 2026

It's great to see progress towards dynamic dependencies in Bazel!!!

Buck2 introduces dynamic_output for such cases: https://buck2.build/docs/rule_authors/dynamic_dependencies/

For Buck2 reference on API design, please take a look at the new dynamic_output_new API.

As another reference point, I've spoken about how we've used Buck2's API for Haskell (another ML family language) at BazelCon 2025. As you'll see there the new Buck2 API goes a bit further and makes it possible to share dynamically resolved dependency information efficiently across targets. This was important to achieve module granular incremental builds across Haskell package (target) boundaries without having to reconstruct the full cross package module dependency graph at each target which would be quadratic on the project level.

0 replies

jonnyboynewton · 2026-03-03T00:18:59Z

jonnyboynewton
Mar 3, 2026

I posted this on the slack thread but figured I'd add here too.
@rrbutani created this PoC for doing these reads in a map_directory flow:
master...rrbutani:bazel:poc/dyn-deps

we're going to be heavily using map_directory for a dynamic flow for cpu/gpu/soc builds at NVIDIA, but we have a desire for the dynamic dependencies to be a bit more first-class feature level. We're following the pattern in the adventures_in_map_directory for the DAG traversal, using a similar mechanism to the marker files there, but it's a bit hacky (though it does seem to work well).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic Dependencies #28346

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 6 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Dynamic Dependencies #28346

Uh oh!

Uh oh!

reutermj Jan 20, 2026

Compiling Lean: An Example

map_directory

Prior Art for API

Replies: 6 comments

Uh oh!

lberki Jan 21, 2026 Maintainer

Uh oh!

Uh oh!

zhengwei143 Jan 22, 2026 Collaborator

Uh oh!

reutermj Jan 25, 2026 Author

Uh oh!

zhengwei143 Jan 26, 2026 Collaborator

Uh oh!

aherrmann Feb 3, 2026

Uh oh!

jonnyboynewton Mar 3, 2026

reutermj
Jan 20, 2026

`map_directory`

lberki
Jan 21, 2026
Maintainer

zhengwei143
Jan 22, 2026
Collaborator

reutermj
Jan 25, 2026
Author

zhengwei143
Jan 26, 2026
Collaborator

aherrmann
Feb 3, 2026

jonnyboynewton
Mar 3, 2026