Skip to content

[native_assets_cli] BuildOutput extension: addDataAssetDirectories #2097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Mar 20, 2025

Conversation

MichealReed
Copy link
Contributor

@MichealReed MichealReed commented Mar 12, 2025

This PR adds helper extensions to BuildOutputBuilder to assist with adding assets and directories to the package.

addFoundCodeAssets will recursively search the input config's target directory to find any libraries matching the given name. The prefix and extension are determined through use of the targetOS helper libraryFileName. Because the CodeAsset name does not always match the file that the library is contained in, we need to use a mapping where the key is used to find the file, and the name is the name it will become available to dart as, otherwise we see an error like below as the fallback does not seem to work in this case.

Invalid argument(s): Couldn't resolve native function 'add' in 'package:download_asset/native_add.dart' : No asset with id 'package:download_asset/native_add.dart' found. Available native assets: package:download_asset/native_add_windows_x64.dart. Attempted to fallback to process lookup.

example usage:

      await output.addFoundCodeAssets(
        input: input,
        assetMappings: [
          // asset to find : name to add it as
          { 'native_add_windows_x64' : 'native_add.dart'},
        ],
      );

addDataAssetDirectories takes in a list of paths or files and adds them as DataAsset dependencies, throwing an error if a given path is not found.

example usage:

await output.addDataAssetDirectories([
      'assets_a/file.txt', // adds the parent dir and file to dependencies
      'assets_b' // recursively adds uris
    ], input: input);

Tests have been added to data_assets/validation_test.dart and code_assets/validation_test.dart and the download example was updated to use addFoundCodeAssets.

Open Questions:

  • If we add a code asset then addDataAssetDirectories adds the folder that asset exists in, will its inclusion as a data asset cause conflict?
  • Is there a better way to derive name.dart for the CodeAsset without mappings?
  • BuildOutputBuilder seemed like a better place for the extension so these are accessible in the hook. Any thoughts?

closes #1346

@MichealReed MichealReed changed the title BuildOutput extensions addFoundCodeAssets and addDataAssetDirectories [native_assets_cli] BuildOutput extensions addFoundCodeAssets and addDataAssetDirectories Mar 12, 2025
Copy link
Collaborator

@dcharkes dcharkes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MichealReed!

  • If we add a code asset then addDataAssetDirectories adds the folder that asset exists in, will its inclusion as a data asset cause conflict?

Code assets and data assets live in a separate namespace, so that should not lead to issues.

  • Is there a better way to derive name.dart for the CodeAsset without mappings?

I'm inclined to not add support for addFoundCodeAssets. If users need to do a mapping from found files to asset id, they probably want to do it programmatically. Also, they might want to map asset id to file path instead. Or they might want to map targetOS+targetArch to both filename and asset id. So I'm not assuming that code assets will be so regular a helper function would make sense.

Do you have a use case where you want addFoundCodeAssets for?

  • BuildOutputBuilder seemed like a better place for the extension so these are accessible in the hook.

sgtm! 👍

final packageName = input.packageName;
final packageRoot = input.packageRoot;
for (final path in paths) {
final resolvedUri = packageRoot.resolve('$packageName/$path');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have $packageName/ here? Wouldn't we simply want to resolve $path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files created in the tests were showing up under $packageName, is there something else moving the assets to this path?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was kind of expecting this function to do what pkgs/native_assets_builder/test_data/simple_data_asset/hook/build.dart does, and that file being refactored to simply a call to this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can test_data/simple_data_asset be tested? It depends on a nonexistent dart:asset and ByteAsset.

Pushed a fix for name that mirrors that example, we still must resolve the relative $packageName/$path to find the file though. Do you want this to be refactored for compatibility with system absolute paths or relative only?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way to test for now is to run dart test in package:native_assets_builder.

we still must resolve the relative $packageName/$path to find the file though.

I don't understand, if I change the helper function to final resolvedUri = packageRoot.resolve(path);, then dart test in package:native_assets_builder succeeds.

Do you want this to be refactored for compatibility with system absolute paths or relative only?

I think relative paths in the package is fine for now. That's the most common use case. (For desktop applications that use an LLM that's installed on the system, I can imagine wanting to support absolute paths. So we could add such support in the future. We should be able to distinguish a string containing an absolute path from a relative path.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be good now, the data assets validation tests makeDataBuildInput was not resolving the packageRoot correctly, which caused the failure in the validation test. Now both simple_data_asset and the validation tests pass with resolve(path).

@MichealReed
Copy link
Contributor Author

MichealReed commented Mar 12, 2025

Thanks @MichealReed!

Gladly!

I'm inclined to not add support for addFoundCodeAssets. If users need to do a mapping from found files to asset id, they probably want to do it programmatically. Also, they might want to map asset id to file path instead. Or they might want to map targetOS+targetArch to both filename and asset id. So I'm not assuming that code assets will be so regular a helper function would make sense.

Do you have a use case where you want addFoundCodeAssets for?

addFoundCodeAssets is not for mapping, it's for recursively finding libraries in the build output. It needs mappings to map to the name ffi expects.

I think the changes in the download_asset hook demonstrate this sort of programmatic mapping. The generated file includes extensions so those must be stripped for the hash to validate and be normalized for library finding.

      await output.addFoundCodeAssets(
        input: input,
        assetMappings: [
          // asset to find : name to add it as
          { targetName : 'native_add.dart'},
        ],
      );

If we provide targetName directly (without mapping) as the CodeAsset name argument we see

Couldn't resolve native function 'add' in 'package:download_asset/native_add.dart' : 
No asset with id 'package:download_asset/native_add.dart' found. 

Available native assets: package:download_asset/native_add_windows_x64.dart.

Name is still somewhat ambiguous at the moment to me and from what I can tell, maps to the generated dart file. Meaning most variables must be stripped. Unless this reads exactly native_add.dart for the codeasset name, the program will not run and as mentioned the fallback method does not find add in native_add_windows_x64.dart. The mapping is just to map files of varying architecture and platform types or unexpected names to the expected library name from the dart side.

In another use case, there are sometimes libraries that expect other dynamic libraries be bundled with the executable and have nested ways to open the library. This allows users to find multiple libraries from a single build to be bundled with the output.

There are also cases like with cmake output libraries will be heavily nested, or libraries from child builds will be needed and users need a quick way to find these.

@dcharkes
Copy link
Collaborator

addFoundCodeAssets is not for mapping, it's for recursively finding libraries in the build output.

There are also cases like with cmake output libraries will be heavily nested, or libraries from child builds will be needed and users need a quick way to find these.

Ah got it! You want to simply invoke the cmake process and then read the file system of what it output. (Does CMake not have a way for it to tell us what dylibs it output? Such that we would parse that information instead of scanning the file system?)

If we want to infer information from the files on disk, I think we should make it more general.

It should then also work for (a) downloading dynamic libraries, (b) grabbing dynamic libraries from an external package manager directory, and (c) dynamic libraries checked in to the package git [please don't!].

To make it more general:

  1. Recognize whether it's a dylib or static lib. And if both with the same name are available, respect the CodeConfig.linkModePreference.
  2. Use a tool to recognize the target architecture instead of assuming it's the CodeConfig.targetArchitecture. (And filtering the dylibs/static-libs on whether the architecture matches the CodeConfig.targetArchitecture.) We are using such tools already in the tests in package:native_toolchain_c. However, then I don't want such extension to be defined in package:native_assets_cli then it should be in package:native_toolchain_c most likely.
  3. Use a tool to recognize the target OS. I don't know if this exists. Can we recognize whether a dylib is meant for Android or Linux? Both will be .so. Can we recognize whether a dylib is meant for iOS or MacOS?

If we go that route, I think assetMappings should be a callback function makeAssetId(CodeAssetInfo info) that has access to all the inferred information, so that users can write arbitrary Dart code in there to come up with an assetId (info) => '$packageName/src/${info.file.path.split(...).last}.dart'.

And we should also not limit it to the output directory, we should allow using an arbitrary directory. (For example allowing users to run an external package manager and then pointing to a directory with dylibs in there.)

This is a very different change from the data asset change, so I'd say that should be a separate PR if we go that way.

Also, in package:native_toolchain_cmake you can assume that you have the right targetArchitecture and the right targetOS and something respecting the link mode preference. So, maybe it makes more sense for now to just keep the extension private in that package and not try to solve it for the general case just yet. That would avoid having to detect architecture/os/link mode from the files. WDYT?

@MichealReed
Copy link
Contributor Author

MichealReed commented Mar 13, 2025

Ah got it! You want to simply invoke the cmake process and then read the file system of what it output. (Does CMake not have a way for it to tell us what dylibs it output? Such that we would parse that information instead of scanning the file system?)

I think it's infeasible for it to do so, one build process can have many libraries. A single build of Dawn contains many projects which produce static or dynamic libraries of their own. Sometimes we built it for the monolithic dynamic lib produced, sometimes we want to grab the statics or other dynamic libs for linking.

It should then also work for (a) downloading dynamic libraries, (b) grabbing dynamic libraries from an external package manager directory, and (c) dynamic libraries checked in to the package git [please don't!].

It should already cover these cases, just download/build/make appear the libraries to a folder and provide the folder and a list of a names and this helper would find them.

  1. Recognize whether it's a dylib or static lib. And if both with the same name are available, respect the CodeConfig.linkModePreference.

This already does a selective find to get the proper lib based on

input.config.code.targetOS.libraryFileName(
            searchName,
            linkMode,
          );
2. Use a tool to recognize the target architecture instead of assuming it's the `CodeConfig.targetArchitecture`. (And filtering the dylibs/static-libs on whether the architecture matches the `CodeConfig.targetArchitecture`.) We are using such tools already in the tests in `package:native_toolchain_c`. However, then I don't want such extension to be defined in `package:native_assets_cli` then it should be in `package:native_toolchain_c` most likely.
3. Use a tool to recognize the target OS. I don't know if this exists. Can we recognize whether a dylib is meant for Android or Linux? Both will be `.so`. Can we recognize whether a dylib is meant for iOS or MacOS?

Is the concern here that a monolithic output might be produced with many different os/architectures and this would find based on prefix and extension potentially the wrong one? Even CMake in its maturity does not detect this and opted for a hints and paths approach for more manual control over library finding. TargetOS and library naming convention should protect us in most cases here like seen in the updated download example. It may be over-engineering to try to handle this where advanced toolchains even do not, developers still with this helper would have the option to manually add CodeAssets.

it should be in package:native_toolchain_c most likely.

Maybe a bigger discussion about separation? Many of the helpers in toolchain_c are beneficial to other toolchains and it does not seem logic to include toolchain_c in projects that do not need a cbuilder.

If we go that route, I think assetMappings should be a callback function makeAssetId(CodeAssetInfo info) that has access to all the inferred information, so that users can write arbitrary Dart code in there to come up with an assetId (info) => '$packageName/src/${info.file.path.split(...).last}.dart'.

Then only one library could be found at a time. I could not find ffi.so and dep.so in one go as they will be added with the same name. Couldn't the user effectively accomplish this via string interpolation in the map or by providing a prebuilt variable to the map?

And we should also not limit it to the output directory, we should allow using an arbitrary directory. (For example allowing users to run an external package manager and then pointing to a directory with dylibs in there.)

This is why outputDirectory is nullable and null assigned as input.outputDirectory. A user can provide any directory as is and it will run the recursive find there or as default, search the input's out folder.

This is a very different change from the data asset change, so I'd say that should be a separate PR if we go that way.

Understood, can strip this out if there's no way we can land a version of this here. I think it already covers most cases mentioned and those not, would easily be recognizable and fixable by devs through errors.

@MichealReed
Copy link
Contributor Author

MichealReed commented Mar 13, 2025

Probably the only way to be certain about an architecture/operating system would be to inspect the binary of the file to identify through ELF, PE, and Mach-O with additional symbol inspection per platform to determine OS differences, primarily between iOS/MacOS/VisionOS and Linux, Android and Fuschia (if even needed, they may be compatible anyways, maybe try to load it?). Getting a comprehensive map of fallback symbols per platform would probably be the most intensive part of doing this, but it all could be done in dart versus relying on external tools. We may get lost in a maze of edge-cases trying to make the detection flawless though.

@dcharkes
Copy link
Collaborator

It should then also work for (a) downloading dynamic libraries, (b) grabbing dynamic libraries from an external package manager directory, and (c) dynamic libraries checked in to the package git [please don't!].

It should already cover these cases, just download/build/make appear the libraries to a folder and provide the folder and a list of a names and this helper would find them.

I'm thinking users might download/build for all target architectures and OSes in a single directory, like a tarball. That means it would pick up the wrong ones as well. Ditto someone might write a build script that outputs fat binaries, containing more than one architecture. So I don't believe it covers those cases in general. Therefore, I am hesitant to provide something in the API that everyone sees that covers only a subset of the use cases.

Maybe a bigger discussion about separation? Many of the helpers in toolchain_c are beneficial to other toolchains and it does not seem logic to include toolchain_c in projects that do not need a cbuilder.

Agreed!

And the tools abstraction itself should also be reusable for other toolchains than just C toolchains.

But that kind of requires a refactoring as well, as there are some issues with how it's currently set up.

I don't have the cycles to deal with that unfortunately atm. Maybe such packages should be community owned as well.

(The way to make progress on your package now is probably to either do an ugly src/ import of package:native_toolchain_c and pin the dependency. Or, maybe better, duplicate the subset of logic you need in your own package.)

Then only one library could be found at a time.

Why? Wouldn't you call the callback for every single library you find?

Understood, can strip this out if there's no way we can land a version of this here

Yes, let's land the data asset extension. 👍

Probably the only way to be certain about an architecture/operating system would be to inspect the binary of the file to identify through ELF, PE, and Mach-O with additional symbol inspection per platform to determine OS differences, primarily between iOS/MacOS/VisionOS and Linux, Android and Fuschia (if even needed, they may be compatible anyways, maybe try to load it?). Getting a comprehensive map of fallback symbols per platform would probably be the most intensive part of doing this, but it all could be done in dart versus relying on external tools. We may get lost in a maze of edge-cases trying to make the detection flawless though.

Agreed, therefore I'm leaning towards not adding this for "all dylibs and static libs in a directory". It feels like a lot of engineering effort to get it right for the general case. For your specific case you already know the target OS and target architecture is right, because you passed it in to CMake.

@MichealReed
Copy link
Contributor Author

I'm thinking users might download/build for all target architectures and OSes in a single directory, like a tarball. That means it would pick up the wrong ones as well. Ditto someone might write a build script that outputs fat binaries, containing more than one architecture. So I don't believe it covers those cases in general.

If libraries have the same name, they would be split into platform specific folders, they cannot have the same name in the same folder. If they are named ambiguously and it grabs the wrong library the first implementation, it will error, and a developer can provide platform specific paths to find the correct output directory. Fat libraries would work as well as the builder supports this. This already works for the download_asset example without conflict.

The generalization here is in line with the libraryFileName helper. Rejecting it under that premise is a bit dis-encouraging, but it's your show to run here.

identify through ELF, PE, and Mach-O

This is very viable though and the external tooling should be replaced with dart native code that does this. I've prototyped something here

https://gist.github.com/MichealReed/bc2aa5d059133ab17bf7571b353335fe

For your specific case you already know the target OS and target architecture is right, because you passed it in to CMake.

In other cases, this would be so too, builds will either be platform isolated with the temporary directory approach the builder currently uses, or they will be manually isolated into different build folders (build/platform/arch). Building everything for all platforms to the same folder is an anti-pattern that breaks other stuff and developers should avoid, not an edge case to cover.

Yes, let's land the data asset extension. 👍

Roger that, will remove this code over the weekend and we can finalize on Monday. Let me know if you have more questions or change your mind 👍

Copy link

PR Health

Breaking changes ✔️
Package Change Current Version New Version Needed Version Looking good?
Changelog Entry ✔️
Package Changed Files

Changes to files need to be accounted for in their respective changelogs.

API leaks ✔️

The following packages contain symbols visible in the public API, but not exported by the library. Export these symbols or remove them from your publicly visible API.

Package Leaked API symbols
License Headers ✔️
// Copyright (c) 2025, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.
Files
no missing headers

All source files should start with a license header.

Unrelated files missing license headers
Files
pkgs/jni/lib/src/third_party/generated_bindings.dart
pkgs/objective_c/lib/src/ns_input_stream.dart

final packageName = input.packageName;
final packageRoot = input.packageRoot;
for (final path in paths) {
final resolvedUri = packageRoot.resolve('$packageName/$path');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was kind of expecting this function to do what pkgs/native_assets_builder/test_data/simple_data_asset/hook/build.dart does, and that file being refactored to simply a call to this function.

final packageName = input.packageName;
final packageRoot = input.packageRoot;
for (final path in paths) {
final resolvedUri = packageRoot.resolve('$packageName/$path');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way to test for now is to run dart test in package:native_assets_builder.

we still must resolve the relative $packageName/$path to find the file though.

I don't understand, if I change the helper function to final resolvedUri = packageRoot.resolve(path);, then dart test in package:native_assets_builder succeeds.

Do you want this to be refactored for compatibility with system absolute paths or relative only?

I think relative paths in the package is fine for now. That's the most common use case. (For desktop applications that use an LLM that's installed on the system, I can imagine wanting to support absolute paths. So we could add such support in the future. We should be able to distinguish a string containing an absolute path from a relative path.)

@MichealReed MichealReed changed the title [native_assets_cli] BuildOutput extensions addFoundCodeAssets and addDataAssetDirectories [native_assets_cli] BuildOutput extension: addDataAssetDirectories Mar 18, 2025
@coveralls
Copy link

coveralls commented Mar 19, 2025

Coverage Status

coverage: 86.652% (-0.3%) from 86.948%
when pulling 57d6d41 on MichealReed:dependency_helpers
into 0cd2e55 on dart-lang:main.

@dcharkes dcharkes merged commit 8132054 into dart-lang:main Mar 20, 2025
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[native_assets_cli] Helper to add all files in a directory as DataAssets
3 participants