Native namespace packages (PEP 420)#2361
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Nice! If I understand correctly, all Optimum subpackages would have to convert their relative imports into absolute ones right? |
|
@regisss no not really it's just for readability (I can revert it and do it in a separate PR), The only condition here is that directories which we want to be extensible / to be a namespaces e.g. optimum, optimum.exporters, optimum.commands, etc should not have an |
269f026 to
441a21a
Compare
- Remove legacy export of decoders - Remove deprecated arguments: use_auth_token, fp16 vs dtype, for_ort, some optimization arguments - Add some missing optimization arguments (added clip optimization to catch them) following [FusionOptions args](https://github.com/microsoft/onnxruntime/blob/88f2652dee465faa485cf68cb39c4489f28945bb/onnxruntime/python/tools/transformers/fusion_options.py#L136) - Fixed CI by adding a Hub token with access to optimum-internal-testing and optimum orgs - removed direct imports from namespaces packages (optimum.exporters, optimum.commands, etc) for compatibility with native namespace packages (see huggingface/optimum#2361). Didn't remove support for loading and inference with legacy models entirely as it would break many tests that still use them.
| # The table below is dynamically populated when loading subpackages | ||
| _OPTIMUM_CLI_SUBCOMMANDS = [] | ||
|
|
||
|
|
||
| def optimum_cli_subcommand(parent_command: Optional[Type[BaseOptimumCLICommand]] = None): |
There was a problem hiding this comment.
left only one way for command registration, wdyt ?
There was a problem hiding this comment.
Using load_optimum_namespace_cli_commands defined below right?
There was a problem hiding this comment.
yes agreed, makes more sense to have one way imo, not sure if optimum_cli_subcommand / _OPTIMUM_CLI_SUBCOMMANDS was used by any subpackage but don't think so
There was a problem hiding this comment.
it is used in optimum-quanto and optimum-nvidia
There was a problem hiding this comment.
@dacorvo do you prefer we keep this mechanism ?
There was a problem hiding this comment.
I don't know if something has changed recently, but the legacy "registration" mechanism in optimum relied on files explicitly copied under a specific path. What I did instead was to use an explicit registration mechanism for subpackages: #1894. This is a cleaner solution IMHO, but I am open to any equivalent solution to allow the optimum-cli to be extended that does not require files to be installed at a specific path.
There was a problem hiding this comment.
FYI, the legacy convention does not work with editable mode for subpackages, because it expects the aforementioned files to be present in a path rooted under sites-packages. I also suspect this legacy mechanism is why the optimum-cli is so slow to start (sometimes tenth of seconds to just display the help).
There was a problem hiding this comment.
FYI, the legacy convention does not work with editable mode for subpackages, because it expects the aforementioned files to be present in a path rooted under sites-packages.
it's no longer the case the case with this PR, native namespace packages work with editable mode, you can even have both optimum and optimum-subpackage in editable mode (see the CI changes in this PR).
There was a problem hiding this comment.
not require files to be installed at a specific path.
I think this was not a problem in the registration mechanism but rather in how the "namespacing" of optimum was implemented. with native namespaces, everything works out of the box, as long as we follow pep 420 (no init in namespaces, ie. optimum. optimum.exporters. optimum.commands.).
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| __path__ = __import__("pkgutil").extend_path(__path__, __name__) |
There was a problem hiding this comment.
the cli also starts faster now because it doesn't need to import pkgutil
So that means that the same should be done for Optimum subpackages? E.g. this init.py in optimum-habana should be removed? |
|
@regisss no, optimum.habana can have an init file as it is not a namespace (no other subpackage is overloading it with modules) optimum-habana will only have to import the tasks manager from optimum.exporters.tasks instead of optimum.exporters (if it uses it), that's it. |
echarlaix
left a comment
There was a problem hiding this comment.
Great work, thanks a lot @IlyasMoutawwakil !
| # The table below is dynamically populated when loading subpackages | ||
| _OPTIMUM_CLI_SUBCOMMANDS = [] | ||
|
|
||
|
|
||
| def optimum_cli_subcommand(parent_command: Optional[Type[BaseOptimumCLICommand]] = None): |
There was a problem hiding this comment.
yes agreed, makes more sense to have one way imo, not sure if optimum_cli_subcommand / _OPTIMUM_CLI_SUBCOMMANDS was used by any subpackage but don't think so
| if dist.metadata["Name"] == "optimum": | ||
| # NOTE: optimum no longer registers commands in the main package, we do this for testing purposes only | ||
| # optimum can't have an __init__.py file, so we use optimum.pipelines instead | ||
| dist_name = "optimum.pipelines" |
There was a problem hiding this comment.
there's a test in tests/cli that puts a register_custom_command.py in optimum/commands/register and for it to work we need the location of optimum but since the package sees 'optimum' as a namespace it doesn't really have a location (it has multiple locations, one in every distribution) so we use optimum.pipelines.
this hack and the test i'm talking about are not really needed at this point, because now we do test command registration by installing optimum and optimum-onnx. I can remove both the hack and the custom command test wdyt ?
There was a problem hiding this comment.
I just pushed an even better way of discovering commands without loading the entirety of the model, and it works with commands in optimum as well (i.e. the custom command test).
dacorvo
left a comment
There was a problem hiding this comment.
I disagree with removing the dynamic registration mechanism implemented in subpackages.py, as it will break the packages that were using it.
My understanding is that this pull-request more or less rewrites the legacy mechanism that requires each subpackage to have registration files installed under optimum.commands.register, which IMHO has serious limitations.
First, I am not convinced that with that solution the optimum-cli extension mechanism works in editable mode, as in the CI the subpackages are installed twice: first in normal then in editable mode. This means that when using the optimum-cli, the files under sites-packages for optimum commands will be used, and not the ones under the source tree (this is exactly the issue we are having in optimum-neuron for instance).
Second, with that convention of "sharing" a path (optimum.commands), it is unknown what happens if you update optimum after having installed your package. We had users reporting that the neuron commands were not registered anymore.
If you don't want to keep subpackage.py, which I still think was an improvement over the solution implemented here, then consider using python importlib.metadata.entry_points, which can be declared in each subpackage pyproject.toml and discovered dynamically by optimum.
You can see an example of such a mechanism in vLLM:
https://github.com/vllm-project/vllm/blob/a332b84578cdc0706e040f6a765954c8a289904f/vllm/plugins/__init__.py#L17
https://github.com/huggingface/optimum-neuron/blob/5fca5b2b793a2eb2cbd7c64b2d24ddb3288134de/pyproject.toml#L123
In the meantime, I would rather keep the existing subpackages.py mechanism.
b474435 to
8df084c
Compare
reverted the changes to that part, this PR now only concerns namespace packages that also use the optimum.commands registration mechanism.
When you install a package in editable mode after having installed it statically, the package under site-packages is removed. I added an explicit uninstallation command
The convention of "sharing" (optimum.commands), is the definition of namespace packages, the same way they share optimum.exporters, I'm not sure why one is okay with you (sharing optimum.exporters) but not the other (optimum.commands). |
Thanks
With these explanations, I understand it should work
I am not at all at ease with using namespaces to share code between packages, either |
… only import the commands
e025005 to
37e7e76
Compare
| # we only load the command register module, which should be lightweight if it keeps all its relevant imports inside the run function | ||
| # e.g. https://github.com/huggingface/optimum-onnx/blob/671b84f78a244594dd21cb1a8a1f7abb8961ea60/optimum/commands/export/onnx.py#L254 | ||
| # this way we avoid loading all subpackages and their dependencies at cli startup |
There was a problem hiding this comment.
I merged the two loops and the command loading is more efficient now as it uses optimum.commands.register's specs to get a list of all paths using the namespace, from there we only load the register files, avoiding importing the entire subpakage.
What does this PR do?
This PR proposes the use of native namespace packages (PEP 420) which only requires us to not have an init.py file in the namespace directories. I believe this is a very simple rule to follow and allows us to have all benefits of namespace packages while also supporting working in editable mode in multiple distributions.
It also updates how subcommands are loaded following the above mentioned changes.
As a proof of work, I made a couple changes (simply avoiding importing from namespaces) in optimum-onnx, and with that we can now test both in editable mode, cli works in editable mode as well.
Before submitting
Who can review?