Skip to content

Native namespace packages (PEP 420)#2361

Merged
IlyasMoutawwakil merged 26 commits into
mainfrom
namespace-command-loading
Oct 1, 2025
Merged

Native namespace packages (PEP 420)#2361
IlyasMoutawwakil merged 26 commits into
mainfrom
namespace-command-loading

Conversation

@IlyasMoutawwakil
Copy link
Copy Markdown
Member

@IlyasMoutawwakil IlyasMoutawwakil commented Sep 26, 2025

What does this PR do?

This PR proposes the use of native namespace packages (PEP 420) which only requires us to not have an init.py file in the namespace directories. I believe this is a very simple rule to follow and allows us to have all benefits of namespace packages while also supporting working in editable mode in multiple distributions.

It also updates how subcommands are loaded following the above mentioned changes.

As a proof of work, I made a couple changes (simply avoiding importing from namespaces) in optimum-onnx, and with that we can now test both in editable mode, cli works in editable mode as well.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@regisss
Copy link
Copy Markdown
Contributor

regisss commented Sep 26, 2025

Nice! If I understand correctly, all Optimum subpackages would have to convert their relative imports into absolute ones right?

@IlyasMoutawwakil
Copy link
Copy Markdown
Member Author

IlyasMoutawwakil commented Sep 26, 2025

@regisss no not really it's just for readability (I can revert it and do it in a separate PR),

The only condition here is that directories which we want to be extensible / to be a namespaces e.g. optimum, optimum.exporters, optimum.commands, etc should not have an __init__.py file, and that alone will tell the package manager that they are namespaces and not modules. It does introduce a breaking change, so for example from optimum.exporters import TasksManager will not work because optimum.exporters won't have its init file anymore, you would need to do from optimum.exporters.tasks import TasksManager, but imo that's a fair price to get things working out of the box (e.g editable mode in both) and following the "native" way of managing namespace packages.

@IlyasMoutawwakil IlyasMoutawwakil force-pushed the namespace-command-loading branch from 269f026 to 441a21a Compare September 26, 2025 09:26
IlyasMoutawwakil added a commit to huggingface/optimum-onnx that referenced this pull request Sep 26, 2025
- Remove legacy export of decoders
- Remove deprecated arguments: use_auth_token, fp16 vs dtype, for_ort,
some optimization arguments
- Add some missing optimization arguments (added clip optimization to
catch them) following [FusionOptions
args](https://github.com/microsoft/onnxruntime/blob/88f2652dee465faa485cf68cb39c4489f28945bb/onnxruntime/python/tools/transformers/fusion_options.py#L136)
- Fixed CI by adding a Hub token with access to optimum-internal-testing
and optimum orgs
- removed direct imports from namespaces packages (optimum.exporters,
optimum.commands, etc) for compatibility with native namespace packages
(see huggingface/optimum#2361).

Didn't remove support for loading and inference with legacy models
entirely as it would break many tests that still use them.
Comment on lines -32 to -36
# The table below is dynamically populated when loading subpackages
_OPTIMUM_CLI_SUBCOMMANDS = []


def optimum_cli_subcommand(parent_command: Optional[Type[BaseOptimumCLICommand]] = None):
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left only one way for command registration, wdyt ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using load_optimum_namespace_cli_commands defined below right?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes agreed, makes more sense to have one way imo, not sure if optimum_cli_subcommand / _OPTIMUM_CLI_SUBCOMMANDS was used by any subpackage but don't think so

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is used in optimum-quanto and optimum-nvidia

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dacorvo do you prefer we keep this mechanism ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if something has changed recently, but the legacy "registration" mechanism in optimum relied on files explicitly copied under a specific path. What I did instead was to use an explicit registration mechanism for subpackages: #1894. This is a cleaner solution IMHO, but I am open to any equivalent solution to allow the optimum-cli to be extended that does not require files to be installed at a specific path.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, the legacy convention does not work with editable mode for subpackages, because it expects the aforementioned files to be present in a path rooted under sites-packages. I also suspect this legacy mechanism is why the optimum-cli is so slow to start (sometimes tenth of seconds to just display the help).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, the legacy convention does not work with editable mode for subpackages, because it expects the aforementioned files to be present in a path rooted under sites-packages.

it's no longer the case the case with this PR, native namespace packages work with editable mode, you can even have both optimum and optimum-subpackage in editable mode (see the CI changes in this PR).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not require files to be installed at a specific path.

I think this was not a problem in the registration mechanism but rather in how the "namespacing" of optimum was implemented. with native namespaces, everything works out of the box, as long as we follow pep 420 (no init in namespaces, ie. optimum. optimum.exporters. optimum.commands.).

@IlyasMoutawwakil IlyasMoutawwakil marked this pull request as ready for review September 29, 2025 07:26
Comment thread optimum/__init__.py
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
__path__ = __import__("pkgutil").extend_path(__path__, __name__)
Copy link
Copy Markdown
Member Author

@IlyasMoutawwakil IlyasMoutawwakil Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cli also starts faster now because it doesn't need to import pkgutil

@regisss
Copy link
Copy Markdown
Contributor

regisss commented Sep 30, 2025

The only condition here is that directories which we want to be extensible / to be a namespaces e.g. optimum, optimum.exporters, optimum.commands, etc should not have an __init__.py file, and that alone will tell the package manager that they are namespaces and not modules. It does introduce a breaking change, so for example from optimum.exporters import TasksManager will not work because optimum.exporters won't have its init file anymore, you would need to do from optimum.exporters.tasks import TasksManager, but imo that's a fair price to get things working out of the box (e.g editable mode in both) and following the "native" way of managing namespace packages.

So that means that the same should be done for Optimum subpackages? E.g. this init.py in optimum-habana should be removed?

@IlyasMoutawwakil
Copy link
Copy Markdown
Member Author

IlyasMoutawwakil commented Sep 30, 2025

@regisss no, optimum.habana can have an init file as it is not a namespace (no other subpackage is overloading it with modules)

optimum-habana will only have to import the tasks manager from optimum.exporters.tasks instead of optimum.exporters (if it uses it), that's it.

Copy link
Copy Markdown
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks a lot @IlyasMoutawwakil !

Comment on lines -32 to -36
# The table below is dynamically populated when loading subpackages
_OPTIMUM_CLI_SUBCOMMANDS = []


def optimum_cli_subcommand(parent_command: Optional[Type[BaseOptimumCLICommand]] = None):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes agreed, makes more sense to have one way imo, not sure if optimum_cli_subcommand / _OPTIMUM_CLI_SUBCOMMANDS was used by any subpackage but don't think so

Comment thread optimum/commands/optimum_cli.py Outdated
if dist.metadata["Name"] == "optimum":
# NOTE: optimum no longer registers commands in the main package, we do this for testing purposes only
# optimum can't have an __init__.py file, so we use optimum.pipelines instead
dist_name = "optimum.pipelines"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this ?

Copy link
Copy Markdown
Member Author

@IlyasMoutawwakil IlyasMoutawwakil Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a test in tests/cli that puts a register_custom_command.py in optimum/commands/register and for it to work we need the location of optimum but since the package sees 'optimum' as a namespace it doesn't really have a location (it has multiple locations, one in every distribution) so we use optimum.pipelines.
this hack and the test i'm talking about are not really needed at this point, because now we do test command registration by installing optimum and optimum-onnx. I can remove both the hack and the custom command test wdyt ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed an even better way of discovering commands without loading the entirety of the model, and it works with commands in optimum as well (i.e. the custom command test).

Copy link
Copy Markdown
Contributor

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Contributor

@dacorvo dacorvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with removing the dynamic registration mechanism implemented in subpackages.py, as it will break the packages that were using it.
My understanding is that this pull-request more or less rewrites the legacy mechanism that requires each subpackage to have registration files installed under optimum.commands.register, which IMHO has serious limitations.
First, I am not convinced that with that solution the optimum-cli extension mechanism works in editable mode, as in the CI the subpackages are installed twice: first in normal then in editable mode. This means that when using the optimum-cli, the files under sites-packages for optimum commands will be used, and not the ones under the source tree (this is exactly the issue we are having in optimum-neuron for instance).
Second, with that convention of "sharing" a path (optimum.commands), it is unknown what happens if you update optimum after having installed your package. We had users reporting that the neuron commands were not registered anymore.
If you don't want to keep subpackage.py, which I still think was an improvement over the solution implemented here, then consider using python importlib.metadata.entry_points, which can be declared in each subpackage pyproject.toml and discovered dynamically by optimum.
You can see an example of such a mechanism in vLLM:
https://github.com/vllm-project/vllm/blob/a332b84578cdc0706e040f6a765954c8a289904f/vllm/plugins/__init__.py#L17
https://github.com/huggingface/optimum-neuron/blob/5fca5b2b793a2eb2cbd7c64b2d24ddb3288134de/pyproject.toml#L123
In the meantime, I would rather keep the existing subpackages.py mechanism.

@IlyasMoutawwakil IlyasMoutawwakil force-pushed the namespace-command-loading branch from b474435 to 8df084c Compare October 1, 2025 11:50
@IlyasMoutawwakil
Copy link
Copy Markdown
Member Author

IlyasMoutawwakil commented Oct 1, 2025

I disagree with removing the dynamic registration mechanism implemented in subpackages.py, as it will break the packages that were using it.

reverted the changes to that part, this PR now only concerns namespace packages that also use the optimum.commands registration mechanism.

First, I am not convinced that with that solution the optimum-cli extension mechanism works in editable mode, as in the CI the subpackages are installed twice: first in normal then in editable mode. This means that when using the optimum-cli, the files under sites-packages for optimum commands will be used, and not the ones under the source tree (this is exactly the issue we are having in optimum-neuron for instance).

When you install a package in editable mode after having installed it statically, the package under site-packages is removed. I added an explicit uninstallation command pip uninstall optimum optimum-onnx -y in my last commit to demonstrate that.
We can also go as far as creating two clean envs in the CI if you are still concerned.
You can also create a clean env locally and install optimum-onnx and optimum from this branch, both in editable mode and test it. I have opened this PR after having made sure that it works.

Second, with that convention of "sharing" a path (optimum.commands), it is unknown what happens if you update optimum after having installed your package. We had users reporting that the neuron commands were not registered anymore.
If you don't want to keep subpackage.py, which I still think was an improvement over the solution implemented here, then consider using python importlib.metadata.entry_points, which can be declared in each subpackage pyproject.toml and discovered dynamically by optimum.

The convention of "sharing" (optimum.commands), is the definition of namespace packages, the same way they share optimum.exporters, I'm not sure why one is okay with you (sharing optimum.exporters) but not the other (optimum.commands).
I'm confused because even in optimum-neuron, the registration mechanism used is through optimum.commands which is what we are fixing in this PR. Why haven't optimum-neuron switched to using subpackages.py ?

@dacorvo
Copy link
Copy Markdown
Contributor

dacorvo commented Oct 1, 2025

I disagree with removing the dynamic registration mechanism implemented in subpackages.py, as it will break the packages that were using it.

reverted the changes to that part, this PR now only concerns namespace packages that also use the optimum.commands registration mechanism.

Thanks

First, I am not convinced that with that solution the optimum-cli extension mechanism works in editable mode, as in the CI the subpackages are installed twice: first in normal then in editable mode. This means that when using the optimum-cli, the files under sites-packages for optimum commands will be used, and not the ones under the source tree (this is exactly the issue we are having in optimum-neuron for instance).

When you install a package in editable mode after having installed it statically, the package under site-packages is removed. I added an explicit uninstallation command pip uninstall optimum optimum-onnx -y in my last commit to demonstrate that. We can also go as far as creating two clean envs in the CI if you are still concerned. You can also create a clean env locally and install optimum-onnx and optimum from this branch, both in editable mode and test it. I have opened this PR after having made sure that it works.

With these explanations, I understand it should work

Second, with that convention of "sharing" a path (optimum.commands), it is unknown what happens if you update optimum after having installed your package. We had users reporting that the neuron commands were not registered anymore.
If you don't want to keep subpackage.py, which I still think was an improvement over the solution implemented here, then consider using python importlib.metadata.entry_points, which can be declared in each subpackage pyproject.toml and discovered dynamically by optimum.

The convention of "sharing" (optimum.commands), is the definition of namespace packages, the same way they share optimum.exporters, I'm not sure why one is okay with you (sharing optimum.exporters) but not the other (optimum.commands). I'm confused because even in optimum-neuron, the registration mechanism used is through optimum.commands which is what we are fixing in this PR. Why haven't optimum-neuron switched to using subpackages.py ?

I am not at all at ease with using namespaces to share code between packages, either exporters and commands. To be honest I think the python namespaces are not a good fit for what we are doing in optimum-xxx packages, ie just extend exporters and commands: to me this looks very much like a plugin/backend use case. The reason why we did not switch to subpackages.py is lazyness 🫣.

@IlyasMoutawwakil IlyasMoutawwakil force-pushed the namespace-command-loading branch from e025005 to 37e7e76 Compare October 1, 2025 17:55
Comment on lines +137 to +139
# we only load the command register module, which should be lightweight if it keeps all its relevant imports inside the run function
# e.g. https://github.com/huggingface/optimum-onnx/blob/671b84f78a244594dd21cb1a8a1f7abb8961ea60/optimum/commands/export/onnx.py#L254
# this way we avoid loading all subpackages and their dependencies at cli startup
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged the two loops and the command loading is more efficient now as it uses optimum.commands.register's specs to get a list of all paths using the namespace, from there we only load the register files, avoiding importing the entire subpakage.

@IlyasMoutawwakil IlyasMoutawwakil merged commit 3dde7f3 into main Oct 1, 2025
19 checks passed
@IlyasMoutawwakil IlyasMoutawwakil deleted the namespace-command-loading branch October 1, 2025 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants