Skip to content

CEP XXXX: define the staging output for split builds in the v1 recipe format#102

Open
wolfv wants to merge 27 commits intoconda:mainfrom
wolfv:cache-output
Open

CEP XXXX: define the staging output for split builds in the v1 recipe format#102
wolfv wants to merge 27 commits intoconda:mainfrom
wolfv:cache-output

Conversation

@wolfv
Copy link
Copy Markdown
Contributor

@wolfv wolfv commented Nov 27, 2024

This CEP defines the staging output. It is a special output type that does not turn into a package. Instead, other outputs can inherit from it and use the "dirty" work dir to continue building a package (e.g. for different python bindings), or split the files from the cache into multiple sub-packages (headers, library, man pages ...).

One can try the cache with --experimental in rattler-build. The documentation can be found here: https://prefix-dev.github.io/rattler-build/latest/multiple_output_cache/

@wolfv wolfv changed the title add first version of channel registry cep Define the cache output for v1 multi-output recipes Nov 27, 2024
@wolfv wolfv changed the title Define the cache output for v1 multi-output recipes Define the cache key for v1 multi-output recipes Nov 27, 2024
Copy link
Copy Markdown

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for writing this up, excited to see this moving forward! 🙏 😊


Sometimes it is very useful to build some code once, and then split it into multiple build artifacts (such as shared library, header files, etc.). For this reason, `conda-build` has a special, implicit top-level build.

There are many downsides to the behavior of `conda-build`: it's very implicit, hard to understand and hard to debug (for example, if an output is defined with the same name as the top-level recipe, this output will get the same requirements attached as the top-level).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not just that, tests under such an output will be silently skipped, which regularly trips people up: conda/conda-build#4172

Comment on lines +71 to +72
Since this is not extremely convenient, we could discuss adding a "scratchpad" outside the `src` dir to rattler-build that will be kept between outputs and removed at the end of the whole build.
</details>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To double-check, this would save $SRC_DIR to the scratchpad in between operations, and restore it for the installation of any output that needs it?

I would go as far as saying that cache: implies this behaviour. I don't want to cache stuff in $PREFIX (then I could just install it and call that an output), I want to cache it without installation.

- ${{ pin_subpackage("foo-headers") }}
```

Special care must be taken for `run-exports`. For example, the `${{ compiler('c') }}` package used to build the cache is going to have run-exports that need to be present at runtime for the package. To compute run-exports for the outputs, we use the union of the requirements - so virtually, the host & build dependencies of the cache are injected into the outputs and will attach their run exports to each output.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favour of this, just noting that this diverges from conda-build, which currently does not inject REs from the global build-step to the outputs.

Copy link
Copy Markdown
Contributor

@carterbox carterbox Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not in favor of this because it is not an explicit behavior and not every output will need the run_exports from the cache. For example, if your devel package doesn't have any binaries, it doesn't need to depend on any compiler package exports. If you split your binaries into multiple outputs, each output may only depend on a part of what was used in cache.requirements.host.

I suspect that this design decision (implicit inheritance of run exports from the cache) will lead to more overdepending warnings (which are ignored by most maintainers). In contrast, if we retain the current behavior which is that each output must explicitly enumerate their dependencies, there will be overlinking errors (which are errors that stop the build) when dependencies are missing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside is that the outputs would then have to setup a build / host environment for nothing (just to add run exports).

What would you think of a more complex algorithm where we only attach the run exports to the lowest level output?

E.g. cache -> pkg -> pkg-devel where pkg-devel -> pin_subpackage(pkg, exact=True)?

I see the problem with the overdepending warning.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside is that the outputs would then have to setup a build / host environment for nothing (just to add run exports).

Yeah, and this gets easily forgotten. I understand the issue with over-imbuing outputs with dependencies, but in this case I think the safer default is to inherit the run-exports, unless the output already has its own build/host environment (because then it's safe to assume that this should determine the actual REs for that output).

Part of my thinking on that is that, in many cases, superfluous REs are pretty benign, e.g. it's basically irrelevant whether a metapackage that does pin_subpackage(pkg) has its own REs, when pkg already has those run-exports anyway.

So while I generally dislike recipes having to opt out of implicit things, I think in this case it's the safer/saner default.

What would you think of a more complex algorithm where we only attach the run exports to the lowest level output?

E.g. cache -> pkg -> pkg-devel where pkg-devel -> pin_subpackage(pkg, exact=True)?

Can you explain the algorithm in a sentence? I think we need to have understandable rules. So far I think "inherit REs unless output has its own build/host env" is relatively easy to explain (aside from checking the technical boxes from my POV), and a more complicated algorithm would need to provide a pretty big improvement in comparison to be worth it IMO.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The background section of this proposal specifically calls out implicit behavior as a "downside" of the top-level package. In my experience, confusion about multi-output packages comes from unexpected behavior when the top-level shares a name as one of the output packages so they aren't behaving independently. This is why I maintain that there should be no run_exports across package boundaries from the cache package to output package because creating more interaction between output packages and the top-level package will increase confusion.

Adding a non-trivial run_export tracking algorithm from the cache package to the output packages will not further the design goal of making top-level package cache easier to understand and use. It will only make recipes less verbose. It will be adding complexity and adding implicit behaviors which this proposal claims to dislike. If the goal of this proposal is to make the recipe format more convenient or less verbose, then the background section should claim that instead of claiming that the problem is reader comprehension and implicit behavior.

In my opinion, it's easier for users to understand if all outputs inherit none of the run_exports from the cache package. The easiest way to explain the cache package would be that the cache package is a binary redistribution that you are unpacking for each of the output packages. Renumerating host/build dependencies is verbose, but it's easy to conceptualize as "If I was building only the things in this specific output, what would I need?"

I think I've made all my points about this topic, so I will stop trying to argue in favor of it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a CEP, so the appropriate meeting is the Conda Community Sync? I can attend this week. Feb 12 @ 11am Chicago.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

Copy link
Copy Markdown

@seberg seberg Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the side-lines since I was just bitten by this wanting to reduce duplication while splitting the recipe. (Trying to cargo cult from a recipe that was very conscious about the split requirements to get the right dependencies.)

I can understand that implicitly adding it to all dependencies build on the cache is also odd even if the failure mode is more graceful.

Maybe that doesn't fit the pattern at all, but how about forcing users to be explicit about it? So the user must either indicate which named outputs they want to export the caches dependencies to or indicate that they want to export it to none

You could go as far as allowing ignore_run_exports that are specific for a named output here (to keep them paired with the dependencies):

- export_run_exports:
  split_name1
  split_name2
- export_run_exports:
  split_name3
  - ignore_run_exports:
    # maybe even allow ignoring of the ones that come from cache here?

(I doubt this is valid yaml, but you get the point)

EDIT: And of course some indicator if you have no entry (and an error if indicating nothing or names are wrong).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not a bad idea! We could have an cache.attach_run_exports_to: [ ... ] field and we can require it to have at least one entry (at least by linting).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carterbox what are your feelings about this solution?

Since this is not extremely convenient, we could discuss adding a "scratchpad" outside the `src` dir to rattler-build that will be kept between outputs and removed at the end of the whole build.
</details>

The new files can then be used in the outputs with the `build.files` key:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I denote a dependence of a given output on the cache: phase? For example, we had a bunch of discussion in pytorch over conda-forge/pytorch-cpu-feedstock@981a04d (later reverted conda-forge/pytorch-cpu-feedstock@54fa422), because it is undocumented under what conditions a build will "fan" out (e.g. build multiple python versions on top of the same library; rather than generate a job per python version)

In other words, we need to have some clarity whether an output like

cache:
  requirements:
    build:
      - ${{ compiler('c') }}
    host:
      - # no python here
 
outputs:
  - package:
      name: foo
    requirements:
      host:
        - # depends on cache implicitly?
        - python

together with a CBC of

python:
  - 3.9
  - 3.10
  - 3.11
  - 3.12

will actually "fold" the python dimension into foo or not.

In conda-build, all outputs implicitly depend on the global build stage; perhaps we should make this explicit? build.files already does this partway (for outputs that only use files:). On the other hand, it might be more intuitive to keep the implication that cache: affects everything else - see discussion about skip:).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example repo you can try: https://github.com/wolfv/rattler-build-cache-test

It creates one libcalculator and two py-calculators based on the variants.yaml.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, right now, if there is a cache, all outputs are affected.

Comment on lines +41 to +44
build:
# note: in the current implementation, all other keys for build
# are also valid, but unused
script: build.sh
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that's been a permanent annoyance with the way things have grown in conda-build is that a "global" build.skip: ... does not mean the entire build will get cancelled, because - in theory - the various outputs could do their own thing that doesn't rely on the package-level build. So in order to actually skip a given combination, it needs to be skipped on all outputs (which quickly gets annoying in large multi-output recipes, especially for quick debugging).

If we go with the "all outputs implicitly depend on cache:" approach, then I think it would be good to specify that a skip: true on the cache: level completely skips the build, without having to repeat it per output.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think the cache should have anything to do with skipping.

The top-level can skip fine (which is intersected with the outputs).

Ie. would skip everything on win:

build:
  skip: [win]

outputs:
  - ... 

@wolfv
Copy link
Copy Markdown
Contributor Author

wolfv commented Nov 29, 2024

@h-vetinari I played around with also caching the "source" in prefix-dev/rattler-build#1226 and the example in https://github.com/wolfv/rattler-build-cache-test.

I now think that the cache should have its own source section. Then everything else will become quite straight-forward.

The example would then look like:

recipe:
  name: calculator
  version: 1.0.0

cache:
  source:
    path: ../

  requirements:
    build:
      - ${{ compiler('cxx') }}
      - cmake
      - ninja
  build:
    script:
      # make sure that `alternative_name.md` is not present
      - test ! -f ./alternative_name.md
      - mkdir build
      - cd build
      - cmake $SRC_DIR -GNinja ${CMAKE_ARGS}
      - ninja install

outputs:
  # this first output will include all files installed during the cache build
  - package:
      name: libcalculator

    requirements:
      run_exports:
        - ${{ pin_subpackage('libcalculator') }}

  - package:
      name: py-calculator
    source:
      - path: ../README.md
        file_name: alternative_name.md

    requirements:
      build:
        - ${{ compiler('cxx') }}
        - cmake
        - ninja
      host:
        - pybind11
        - python
        - libcalculator

    build:
      script:
        # assert that the README.md file is present
        - test -f ./alternative_name.md
        - cd build
        - cmake $SRC_DIR -GNinja ${CMAKE_ARGS} -DBUILD_PYTHON_BINDINGS=ON
        - ninja install

@jaimergp
Copy link
Copy Markdown
Member

jaimergp commented Dec 4, 2024

That's so close to the idea of cache being just a type of output! 🥳

outputs:
  - cache:
      name: optional, only needed if there are several caches
  - package:
      name: my-output
    build:
      needs-cache: bool # or cache.name

@wolfv
Copy link
Copy Markdown
Contributor Author

wolfv commented Dec 4, 2024

Yeah it could be done, but I think we can let it wait for v2 of the recipe format! :)

@h-vetinari
Copy link
Copy Markdown

I now think that the cache should have its own source section. Then everything else will become quite straight-forward.

That sounds fine in principle. Could you explain a bit more what the thinking behind that is though? If there's already a source: on recipe-level, would the cache: have to repeat something?

That's so close to the idea of cache being just a type of output! 🥳

That would sound even better! More powerful and yet less divergence. (no opinion on whether this is v1 or v2 stuff)

However, this I don't like

outputs:
  # this first output will include all files installed during the cache build
  - package:
      name: libcalculator

I think this should never be automatic, and always need an explicit opt-in. Either the output says files: (in which case it could pick up parts of what cache: installed), or it actually does the ninja install itself (rather than during cache:). This would also be a better match for what the example is already doing for py-calculator.

@carterbox
Copy link
Copy Markdown
Contributor

@carterbox, please review

Comment on lines +38 to +40
# run: ...
# run_constraints: ...
# ignore_run_exports: ...
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make sense that sections besides cache.build.script, cache.requirements.build and cache.requirements.host are allowed because this is a cache that would never be installed to an end-user environment.

The runtime dependencies don't need to be resolved for the cache because all of the relevant dependencies should be explicitly repeated in the actual outputs packages.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. ignore_run_exports makes sense though.

Copy link
Copy Markdown
Contributor

@carterbox carterbox Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore_run_exports makes sense though.

Why? Could you please explain why you want run_exports from the cache build to all of the outputs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I elaborated on other options further below. We need to attach run exports from the cache to at least 1 output that uses the cache, otherwise there will be bugs.


<details>
<summary>script build.sh default discussion</summary>
We had some debate wether the cache output should _also_ default to `build.sh` or should not have any default value for the `script`. This is still undecided.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a multi-output recipe, do the outputs default to build.sh?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is currently the case. We're thinking of deprecating this though and require users to be explicit about build scripts everywhere. What would you prefer?

name: libfoo
build:
files:
- lib/**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to support negative globs in outputs.build.files selection? For example:

- package:
    name: foo-dev
  build:
    files:
      include:
        - lib/**
      exclude:
        - lib/**.so.*
- package:
    name: libfoo${{ soname }}
  build:
    files:
      include:
        - lib/**.so.*

Would be used to separate runtime libraries from development artifacts because only versioned shared objects are needed at runtime.

More recently, I've also been working on recipes that separate optional plugins into their own output. These are also typically installed to lib/

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go further and only allow

  build:
    files:
      include:
        - ...

and forbid the plain build.files. Forcing the include: to be present costs essentially nothing, but avoids the fallback logic ("is include/exclude present?"), gives a simpler schema, and makes it all-but-obvious that there's an exclude: variant. IMO that's way better than the single extra line it costs.

More importantly, build.files.include must respect the fundamental snapshotting mechanism ("was this file in host already? if so, it's not part of the output whose content we're in the process of determining"), unless overridden explicitly with something like always_include_files. This is essential for slicing a big build into several interrelated chunks. I make this argument in more detail here: conda/conda-build#5455

- ${{ pin_subpackage("foo-headers") }}
```

Special care must be taken for `run-exports`. For example, the `${{ compiler('c') }}` package used to build the cache is going to have run-exports that need to be present at runtime for the package. To compute run-exports for the outputs, we use the union of the requirements - so virtually, the host & build dependencies of the cache are injected into the outputs and will attach their run exports to each output.
Copy link
Copy Markdown
Contributor

@carterbox carterbox Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not in favor of this because it is not an explicit behavior and not every output will need the run_exports from the cache. For example, if your devel package doesn't have any binaries, it doesn't need to depend on any compiler package exports. If you split your binaries into multiple outputs, each output may only depend on a part of what was used in cache.requirements.host.

I suspect that this design decision (implicit inheritance of run exports from the cache) will lead to more overdepending warnings (which are ignored by most maintainers). In contrast, if we retain the current behavior which is that each output must explicitly enumerate their dependencies, there will be overlinking errors (which are errors that stop the build) when dependencies are missing.

@wolfv
Copy link
Copy Markdown
Contributor Author

wolfv commented Mar 13, 2025

We just had a discussion at the build tools meeting with new ideas for the name "cache".

One idea was "prepare" / "prepare-step" / or something along these lines.
I was also thinking of "pre-build-step", or "pre_build" or something like this.

If anyone has another idea, would love to hear it. Cache is a bit too overloaded :)

@wolfv
Copy link
Copy Markdown
Contributor Author

wolfv commented Mar 13, 2025

These are ideas from my trusty assistant:

prepare
preprocess
common_build
shared_step
foundation
base_build
intermediate
scaffold
staging
preflight
common_prep
shared_build
core_build

@carterbox
Copy link
Copy Markdown
Contributor

carterbox commented Mar 13, 2025

What about something that includes the word "outputs" or "multioutput" so it's more obvious that you would only use this section when creating a multi-output recipe. For example:

recipe:
  name: ...
  version: ...

outputs_staging: ...

outputs:
  - package: ...
  - package: ...

"prepare_outputs", "pre_outputs", "multioutput_cache", "outputs_prebuild", "outputs_scaffold", "outputs_staging", ...

@jaimergp
Copy link
Copy Markdown
Member

jaimergp commented Mar 13, 2025

Yea, I like where @carterbox is going. Some more: before_outputs. Also, since we are talking about "preparing the workspace for the later builds", what about prepare_workspace, prepare_output, output_setup or similar?

@h-vetinari
Copy link
Copy Markdown

How about shared_build or perhaps outputs_shared_build...? Actually, if we want to emphasise the association with outputs, we could go one step further, and put it under outputs: directly:

recipe:
  name: ...
  version: ...

outputs:
  - shared:  # or common, shared_build, ...
  - package: ...
  - package: ...

@wolfv
Copy link
Copy Markdown
Contributor Author

wolfv commented Mar 14, 2025

I think moving it under outputs is a nice idea. Main difficulty I see is that people might expect to be able to create more than 1 of those which is not possible right now (maybe later we can allow naming them and having multiple, though). But we can error on that, too ...

@carterbox
Copy link
Copy Markdown
Contributor

The cache is not itself an output though, so we would need to choose a name which indicates as such and doesn't end with the word "build". Otherwise, we would have an outputs.build.build section.

recipe:
  name: ...
  version: ...

outputs:
  - definitely_not_an_output_but_used_to_make_outputs:
      source: ...
      requirements:
        build: ...
        host: ...
      build: ...
  - package: ...
  - package: ...

😆 But seriously... something like setup, staging, scaffold, precursor, bootstrap, mock.

@minrk
Copy link
Copy Markdown

minrk commented Sep 3, 2025

What is needed to move this forward? It's becoming increasingly relevant across conda-forge as v1 migrations proceed, a lot of recipes need this feature before they can move from v0 to v1, and discussion seems to have died off.

Main difficulty I see is that people might expect to be able to create more than 1 of those which is not possible right now (maybe later we can allow naming them and having multiple, though)

I don't think this is likely to be an issue, since doing multiple different non-output-producing things is already quite possible, and the error in the unlikely event folks try will be early and clear. Plus, as you said, it could be added later with a very minor change, which is a point in favor of this design in my mind. I like it being in the outputs list since it has basically all the properties of an output except for package.... It has the same structure of requirements, etc. which will be beneficial to things like conda-smithy that need to find all the requirements in a recipe, whatever they are for.

outputs:
  - staging_area: # or just 'staging'

feels pretty good to me in terms of indicating "this sets up where the output builds are run."

It seems the remaining question is largely about the level of explicit vs automatic run_exports from the staging requirements to the output(s). I personally like the general idea of transitive run_exports from staging still being possible, but explicit. I'm not sure the best way to spell that, though.

One idea:

outputs:
  - staging_area:
     requirements:
      build:
        - ${{ compiler("c") }}
      host:
        - libfoo
      inherit_run_exports:
        - from_packages:
             - package
           to_outputs:
             - output

Where requirements.inherit_run_exports (or transitive_run_exports or whatever) is a list of dicts with two keys:

  • from_packages: a list of host/build dependencies in the staging environments. If unspecified, defaults to all host/build dependencies.
  • to_outputs: a list of output names. If unspecified, defaults to all outputs.

This should give full flexibility in that any subset(s) of run-exports can be transitively applied to any subset(s) of outputs. I'm not sure what the baseline all-to-all should look like, because this spec suggests it should be an empty dict:

inherit_run_exports:
  - {}

which is clear when the spec is understood, but absolutely not clear just from reading it, and looks weird. So maybe a special scalar value for 'all' instead of defining what empty means:

inherit_run_exports:
  - from_packages: all
    to_outputs: all

for all-to-all. The list allows subsetting, like:

inherit_run_exports:
  - from_packages:
      - ${{ compiler("c") }}
    to_outputs: all
  - from_packages: all
    to_outputs:
      - the_main_one

@wolfv
Copy link
Copy Markdown
Contributor Author

wolfv commented Sep 4, 2025

Hi @minrk and sorry for the longer radio silence on this issue. I am excited that you and @isuruf separately reached out to push on this. It has been high on our TODO list for a while now and we definitely want to get to it.

We had some thoughts in the meantime about making an explicit "inherit" key.

That would make a recipe look something like this:

outputs:
  - cache:
      name: foo-cache   # required, string, follows rules of `PackageName`

    source:
      - url: https://foo.bar/source.tar.bz
        sha256: ...

    requirements:
      build:
        - ${{ compiler('c') }}
        - cmake
        - ninja
      host:
        - libzlib
        - libfoo
      # the `run` and `run_constraints` sections are not allowed here
      ignore_run_exports:
        by_name:
          - libfoo

    build:
      # only the script key is allowed here
      script: build_cache.sh

  - package:
      name: foo-headers
    
    # long form of newly added `inherit` key
    inherit:
      from: foo-cache
      run_exports: false

    build:
      files:
        - include/

  - package:
      name: foo
    
    # short form, inherits run exports by default
    inherit: foo-cache

    # will bundle everything else except files already present in `foo-headers`
    requirements:
      host:
        - ${{ pin_subpackage("foo-headers", exact=True) }}

In this case, inheritance would be optional, and inheriting from the run_exports could be controlled precisely as well. We could potentially also allow a cache output to inherit from another cache output. And multiple inheritance is also something that could be potentially added (e.g. inherit: [{ from: foobar }, { from: bizbar }]).

Unfortunately this would be a bit of a bigger refactor in rattler-build and we haven't really "dared" to do this, yet.

This is a longer writeup of the idea: https://gist.github.com/wolfv/1a160129eb7ba7ccaab51fdf25d7e89b

@h-vetinari
Copy link
Copy Markdown

I've been waiting with bated breath as well 😅

    # long form of newly added `inherit` key
    inherit:
      from: foo-cache
      run_exports: false

That looks like a very nice design from my POV! 🤩

@carterbox
Copy link
Copy Markdown
Contributor

I think I agree with this latest proposal to require an inherit keyword for outputs that use the cache instead of automatically using the cache for all packages. It's probably the appropriate level of explicitly opting-in to run_exports from the cache.

Imagine the situation where you have multiple runtime outputs built from the same cache (for example a main library with separate plugin and separate command line tool outputs). What is the recommended technique for inheriting a correct subset of the run_exports from the cache?

  1. ( 🚀 ) Ignore all run_exports from cache; redeclare build/host dependencies in each runtime output
  2. ( 👀 ) Accept all run_exports from cache; use ignore_run_exports.from to selectively remove unused run_exports for each output.
  3. ( 🎉 ) Accept all run_exports from cahce; supress none.
  4. ( 😕 ) Something else?

@minrk
Copy link
Copy Markdown

minrk commented Sep 4, 2025

explicit inherit looks good to me! Does that mean that any output can inherit the work directory from the end state of any other output? I could imagine that being useful and even simplifying things to a degree. Many of my multi-output packages have the form:

  • libfoo: one big package, install main component
  • libfoo-B: inherit from libfoo, install component B
  • libfoo-C: inherit from libfoo, install component C

then I actually wouldn't always need a cache at all, as all my outputs would be 'real'. If that sounds more complex from an implementation stand point, no need to consider it, but from an author standpoint, it actually sounds simpler. Then the spec for cache is extremely clear: it's an output that doesn't result in an exported package, and the starting point of an output with inherit is the endpoint of the specified output (package or no package). The name for not-packaged output entries could even be nopackage: instead of cache:

- nopackage:
     name: staging
  requirements:
    build:
    host:
- package:
    name: libfoo
  inherit: staging
- package:
    name: libfoo-dev
  inherit:
    from: staging
    run_exports: false
  build: build-dev
  requirements:
    run:
      - ${{ pin_subpackage("libfoo", exact=True) }}

Is inherit.run_exports only a bool (all or none), or can it subset with the same shape as ignore_run_exports (from_package and by_name)?

What is the recommended technique for inheriting a correct subset of the run_exports from the cache?

I think "(👀) inherit and selectively ignore" is probably ideal because it avoids recomputing the host env which could come up with a different solution if there are constraints on the kept RE packages due to other packages in the cache envs. Here's my reading of how that would look in the above spec:

- cache:
    name: staging
  requirements:
    build:
      - ${{ compiler("c") }}
    host:
      - libfoo-dev
- package:
    name: accept_all_ignore_one
  inherit:
    name: staging
    run_exports: true
  requirements:
    ignore_run_exports:
      from_package:
        - libfoo-dev

An alternative is the possibility of inherit.run_exports having the same shape as requirements.ignore_run_exports so that individual inherited REs could be opted into from_package or by_name:

- package:
    name: inherit_run_exports_select
  inherit:
    name: staging
    run_exports:
      from_package:
        - ${{ ( compiler("c") | split() )[0] }}

Co-authored-by: Michał Górny <mgorny@gentoo.org>
cep-XXXX.md Outdated
<tr><td> Status </td><td> In Discussion </td></tr>
<tr><td> Author(s) </td><td> Wolf Vollprecht &ltw.vollprecht@gmail.com&gt; </td></tr>
<tr><td> Created </td><td> Nov 27, 2024</td></tr>
<tr><td> Updated </td><td> </td></tr>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<tr><td> Updated </td><td> </td></tr>
<tr><td> Updated </td><td> Mar 26, 2026 </td></tr>

cep-XXXX.md Outdated
<tr><td> Author(s) </td><td> Wolf Vollprecht &ltw.vollprecht@gmail.com&gt; </td></tr>
<tr><td> Created </td><td> Nov 27, 2024</td></tr>
<tr><td> Updated </td><td> </td></tr>
<tr><td> Discussion </td><td> </td></tr>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<tr><td> Discussion </td><td> </td></tr>
<tr><td> Discussion </td><td> https://github.com/conda/ceps/pull/102 </td></tr>

<tr><td> Created </td><td> Nov 27, 2024</td></tr>
<tr><td> Updated </td><td> </td></tr>
<tr><td> Discussion </td><td> </td></tr>
<tr><td> Implementation </td><td> rattler-build </td></tr>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if you could add some permalinks (hash-pinned) here to some of the main modules in rattler-build.

cep-XXXX.md Outdated

Sometimes it is very useful to build some code once, and then split it into multiple build artifacts (such as shared library, header files, etc.). For this reason, `conda-build` has a special, implicit top-level build.

There are many downsides to the behavior of `conda-build`: it's very implicit, hard to understand and hard to debug (for example, if an output is defined with the same name as the top-level recipe, this output will get the same requirements attached as the top-level).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There are many downsides to the behavior of `conda-build`: it's very implicit, hard to understand and hard to debug (for example, if an output is defined with the same name as the top-level recipe, this output will get the same requirements attached as the top-level).
There are many downsides to the behavior of `conda-build`: it's too implicit, hard to understand, and hard to debug. For example, if an output is defined with the same name as the top-level recipe, this output will get the same requirements attached as the top-level.

```yaml
outputs:
- staging:
name: foo-cache # required, string, follows rules of `PackageName`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which rules are these? Encoded by a particular CEP?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @wolfv

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I've changed this to say "same as package.name".

cep-XXXX.md Outdated
Comment on lines +83 to +84
> [!WARNING]
> When using `outputs` we are going to remove the implicit `build.script` pointing to `script.sh`. Going forward, the script name / content has to be set explicitly.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't sound like it belongs in the specification of the CEP. Perhaps in further work, if anywhere. I'd just remove it.

cep-XXXX.md Outdated

A recipe can have zero or more staging outputs.

A staging output looks as follows:
Copy link
Copy Markdown
Member

@jaimergp jaimergp Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be encoded in a more accurate way. It's not clear what "looks as" means. I recommend you use RFC language (MUST, MAY, SHOULD, etc) for the required keys. You may build on existing CEPs to simplify language, but the specification should be as specific (sorry to be redundant) as possible.

e.g.:

As per CEP XXX, outputs MUST be a list of dictionaries, each representing an artifact. We extend that schema with an alternative dictionary type, which MUST comply to this schema:

  • staging MUST be a required key, whose value follows the same schema as package in CEP XX`.
  • Everything else follows the schema described in CEP X, Y and Z.

The existing package outputs MAY replace the build key with a inherit key, whose value MUST comply to this schema:
...

Copy link
Copy Markdown
Member

@jaimergp jaimergp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ideas are sound and the proposal itself is solid, but the text right now does not conform to the quality standards we have set for the other CEPs. I know this was started before the recent wave of standardization, but I think we should strive to keep our own bar high. I suggest you rework the text by following the proposed CEP 0 template and fulfil the specification details with RFC2119. If you need help, I am happy to give a hand with the restructuring!

@jaimergp
Copy link
Copy Markdown
Member

Hey @wolfv, just a heads up @mgorny will open a PR against your branch with the necessary changes 🙏

mgorny and others added 12 commits March 31, 2026 19:56
Signed-off-by: Michał Górny <mgorny@quansight.com>
Signed-off-by: Michał Górny <mgorny@quansight.com>
Co-authored-by: jaimergp <jaimergp@users.noreply.github.com>
Co-authored-by: jaimergp <jaimergp@users.noreply.github.com>
Co-authored-by: jaimergp <jaimergp@users.noreply.github.com>
Signed-off-by: Michał Górny <mgorny@quansight.com>
Signed-off-by: Michał Górny <mgorny@quansight.com>
Co-authored-by: jaimergp <jaimergp@users.noreply.github.com>
Formalize and clarify the CEP
Signed-off-by: Michał Górny <mgorny@quansight.com>
Signed-off-by: Michał Górny <mgorny@quansight.com>
The staging CEP, next part of updates
@jaimergp
Copy link
Copy Markdown
Member

jaimergp commented Apr 1, 2026

Let's fix the pre-commit lint and we are good to start the vote. Thanks @mgorny!

@jaimergp jaimergp dismissed their stale review April 1, 2026 16:02

Concerns addressed

mgorny and others added 2 commits April 1, 2026 18:52
Signed-off-by: Michał Górny <mgorny@quansight.com>
@baszalmstra
Copy link
Copy Markdown
Contributor

baszalmstra commented Apr 2, 2026

Dear @conda/steering-council,

The vote for this CEP has started. It will be open for two weeks, until April 16nd, 2026, 23:59 Anywhere on Earth. This time period has been chosen to make it eligible for time-out rules. As an Enhancement Proposal vote, it requires 60% affirmative votes to pass.

To vote, please mark the relevant checkbox under your username:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.