Skip to content

Commit a237a53

Browse files
dudeperf3ctavishniakovsafoinmestrickvl
authored andcommitted
Huggingface Model Deployer (zenml-io#2376)
* Initial implementation of huggingface model deployer * Add missing step init * Simplify modify_endpoint_name function and fix docstrings * Formatting logger * Add License to new files * Enhancements as per PR review comments * Add logging message to catch KeyError * Remove duplicate variable * Reorder lines for clarity * Add docs for huggingface model deployer * Fix CI errors * Fix get_model_info function arguments * More CI fixes * Add minimal supported version for Inference Endpoint API in huggingface_hub * Relax 'adlfs' package requirement in azure integrations * update TOC (zenml-io#2406) * Relax 's3fs' version in s3 integration * Bugs fixed running a test deployment pipeline * Add deployment pipelines to huggingface integration test * Remove not required check on service running in tests * Address PR comments on documentation and suggested renaming in code * Add partial test for huggingface_deployment * Fix typo in test function * Update pyproject.toml This should allow the dependencies to resolve. * Update pyproject.toml * Relax gcfs * Update model deployers table * Fix lint issue --------- Co-authored-by: Andrei Vishniakov <[email protected]> Co-authored-by: Safoine El Khabich <[email protected]> Co-authored-by: Alex Strick van Linschoten <[email protected]>
1 parent d19d0fb commit a237a53

File tree

24 files changed

+1486
-7
lines changed

24 files changed

+1486
-7
lines changed
Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
---
2+
description: Deploying models to Huggingface Inference Endpoints with Hugging Face :hugging_face:.
3+
---
4+
5+
# Hugging Face :hugging_face:
6+
7+
Hugging Face Inference Endpoints provides a secure production solution to easily deploy any `transformers`, `sentence-transformers`, and `diffusers` models on a dedicated and autoscaling infrastructure managed by Hugging Face. An Inference Endpoint is built from a model from the [Hub](https://huggingface.co/models).
8+
9+
This service provides dedicated and autoscaling infrastructure managed by Hugging Face, allowing you to deploy models without dealing with containers and GPUs.
10+
11+
## When to use it?
12+
13+
You should use Hugging Face Model Deployer:
14+
15+
* if you want to deploy [Transformers, Sentence-Transformers, or Diffusion models](https://huggingface.co/docs/inference-endpoints/supported_tasks) on dedicated and secure infrastructure.
16+
* if you prefer a fully-managed production solution for inference without the need to handle containers and GPUs.
17+
* if your goal is to turn your models into production-ready APIs with minimal infrastructure or MLOps involvement
18+
* Cost-effectiveness is crucial, and you want to pay only for the raw compute resources you use.
19+
* Enterprise security is a priority, and you need to deploy models into secure offline endpoints accessible only via a direct connection to your Virtual Private Cloud (VPCs).
20+
21+
If you are looking for a more easy way to deploy your models locally, you can use the [MLflow Model Deployer](mlflow.md) flavor.
22+
23+
## How to deploy it?
24+
25+
The Hugging Face Model Deployer flavor is provided by the Hugging Face ZenML integration, so you need to install it on your local machine to be able to deploy your models. You can do this by running the following command:
26+
27+
```bash
28+
zenml integration install huggingface -y
29+
```
30+
31+
To register the Hugging Face model deployer with ZenML you need to run the following command:
32+
33+
```bash
34+
zenml model-deployer register <MODEL_DEPLOYER_NAME> --flavor=huggingface --token=<YOUR_HF_TOKEN> --namespace=<YOUR_HF_NAMESPACE>
35+
```
36+
37+
Here,
38+
39+
* `token` parameter is the Hugging Face authentication token. It can be managed through [Hugging Face settings](https://huggingface.co/settings/tokens).
40+
* `namespace` parameter is used for listing and creating the inference endpoints. It can take any of the following values, username or organization name or `*` depending on where the inference endpoint should be created.
41+
42+
We can now use the model deployer in our stack.
43+
44+
```bash
45+
zenml stack update <CUSTOM_STACK_NAME> --model-deployer=<MODEL_DEPLOYER_NAME>
46+
```
47+
48+
See the [huggingface_model_deployer_step](https://sdkdocs.zenml.io/latest/integration_code_docs/integrations-seldon/#zenml.integrations.huggingface.steps.huggingface_deployer.huggingface_model_deployer_step) for an example of using the Hugging Face Model Deployer to deploy a model inside a ZenML pipeline step.
49+
50+
## Configuration
51+
52+
Within the `HuggingFaceServiceConfig` you can configure:
53+
54+
* `model_name`: the name of the model in ZenML.
55+
* `endpoint_name`: the name of the inference endpoint. We add a prefix `zenml-` and first 8 characters of the service uuid as a suffix to the endpoint name.
56+
* `repository`: The repository name in the user’s namespace (`{username}/{model_id}`) or in the organization namespace (`{organization}/{model_id}`) from the Hugging Face hub.
57+
* `framework`: The machine learning framework used for the model (e.g. `"custom"`, `"pytorch"` )
58+
* `accelerator`: The hardware accelerator to be used for inference. (e.g. `"cpu"`, `"gpu"`)
59+
* `instance_size`: The size of the instance to be used for hosting the model (e.g. `"large"`, `"xxlarge"`)
60+
* `instance_type`: Inference Endpoints offers a selection of curated CPU and GPU instances. (e.g. `"c6i"`, `"g5.12xlarge"`)
61+
* `region`: The cloud region in which the Inference Endpoint will be created (e.g. `"us-east-1"`, `"eu-west-1"` for `vendor = aws` and `"eastus"` for Microsoft Azure vendor.).
62+
* `vendor`: The cloud provider or vendor where the Inference Endpoint will be hosted (e.g. `"aws"`).
63+
* `token`: The Hugging Face authentication token. It can be managed through [huggingface settings](https://huggingface.co/settings/tokens). The same token can be passed used while registering the Hugging Face model deployer.
64+
* `account_id`: (Optional) The account ID used to link a VPC to a private Inference Endpoint (if applicable).
65+
* `min_replica`: (Optional) The minimum number of replicas (instances) to keep running for the Inference Endpoint. Defaults to `0`.
66+
* `max_replica`: (Optional) The maximum number of replicas (instances) to scale to for the Inference Endpoint. Defaults to `1`.
67+
* `revision`: (Optional) The specific model revision to deploy on the Inference Endpoint for the Hugging Face repository .
68+
* `task`: Select a supported [Machine Learning Task](https://huggingface.co/docs/inference-endpoints/supported_tasks). (e.g. `"text-classification"`, `"text-generation"`)
69+
* `custom_image`: (Optional) A custom Docker image to use for the Inference Endpoint.
70+
* `namespace`: The namespace where the Inference Endpoint will be created. The same namespace can be passed used while registering the Hugging Face model deployer.
71+
* `endpoint_type`: (Optional) The type of the Inference Endpoint, which can be `"protected"`, `"public"` (default) or `"private"`.
72+
73+
For more information and a full list of configurable attributes of the Hugging Face Model Deployer, check out
74+
the [SDK Docs](https://sdkdocs.zenml.io/latest/integration_code_docs/integrations-huggingface/#zenml.integrations.huggingface.model_deployers) and Hugging Face endpoint [code](https://github.com/huggingface/huggingface_hub/blob/5e3b603ccc7cd6523d998e75f82848215abf9415/src/huggingface_hub/hf_api.py#L6957).
75+
76+
### Run inference on a provisioned inference endpoint
77+
78+
The following code example shows how to run inference against a provisioned inference endpoint:
79+
80+
```python
81+
from typing import Annotated
82+
from zenml import step, pipeline
83+
from zenml.integrations.huggingface.model_deployers import HuggingFaceModelDeployer
84+
from zenml.integrations.huggingface.services import HuggingFaceDeploymentService
85+
86+
87+
# Load a prediction service deployed in another pipeline
88+
@step(enable_cache=False)
89+
def prediction_service_loader(
90+
pipeline_name: str,
91+
pipeline_step_name: str,
92+
running: bool = True,
93+
model_name: str = "default",
94+
) -> HuggingFaceDeploymentService:
95+
"""Get the prediction service started by the deployment pipeline.
96+
97+
Args:
98+
pipeline_name: name of the pipeline that deployed the MLflow prediction
99+
server
100+
step_name: the name of the step that deployed the MLflow prediction
101+
server
102+
running: when this flag is set, the step only returns a running service
103+
model_name: the name of the model that is deployed
104+
"""
105+
# get the Hugging Face model deployer stack component
106+
model_deployer = HuggingFaceModelDeployer.get_active_model_deployer()
107+
108+
# fetch existing services with same pipeline name, step name and model name
109+
existing_services = model_deployer.find_model_server(
110+
pipeline_name=pipeline_name,
111+
pipeline_step_name=pipeline_step_name,
112+
model_name=model_name,
113+
running=running,
114+
)
115+
116+
if not existing_services:
117+
raise RuntimeError(
118+
f"No Hugging Face inference endpoint deployed by step "
119+
f"'{pipeline_step_name}' in pipeline '{pipeline_name}' with name "
120+
f"'{model_name}' is currently running."
121+
)
122+
123+
return existing_services[0]
124+
125+
126+
# Use the service for inference
127+
@step
128+
def predictor(
129+
service: HuggingFaceDeploymentService,
130+
data: str
131+
) -> Annotated[str, "predictions"]:
132+
"""Run a inference request against a prediction service"""
133+
134+
prediction = service.predict(data)
135+
return prediction
136+
137+
138+
@pipeline
139+
def huggingface_deployment_inference_pipeline(
140+
pipeline_name: str, pipeline_step_name: str = "huggingface_model_deployer_step",
141+
):
142+
inference_data = ...
143+
model_deployment_service = prediction_service_loader(
144+
pipeline_name=pipeline_name,
145+
pipeline_step_name=pipeline_step_name,
146+
)
147+
predictions = predictor(model_deployment_service, inference_data)
148+
```
149+
150+
For more information and a full list of configurable attributes of the Hugging Face Model Deployer, check out
151+
the [SDK Docs](https://sdkdocs.zenml.io/latest/integration_code_docs/integrations-huggingface/#zenml.integrations.huggingface.model_deployers).
152+
153+
<!-- For scarf -->
154+
<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>

docs/book/stacks-and-components/component-guide/model-deployers/model-deployers.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ integrations:
4444
| [MLflow](mlflow.md) | `mlflow` | `mlflow` | Deploys ML Model locally |
4545
| [BentoML](bentoml.md) | `bentoml` | `bentoml` | Build and Deploy ML models locally or for production grade (Cloud, K8s) |
4646
| [Seldon Core](seldon.md) | `seldon` | `seldon Core` | Built on top of Kubernetes to deploy models for production grade environment |
47+
| [Hugging Face](huggingface.md) | `huggingface` | `huggingface` | Deploys ML model on Hugging Face Inference Endpoints |
4748
| [Custom Implementation](custom.md) | _custom_ | | Extend the Artifact Store abstraction and provide your own implementation |
4849

4950
{% hint style="info" %}
@@ -85,6 +86,7 @@ zenml model-deployer register seldon --flavor=seldon \
8586
...
8687
zenml stack register seldon_stack -m default -a aws -o default -d seldon
8788
```
89+
8890
2. Implements the continuous deployment logic necessary to deploy models in a way that updates an existing model server
8991
that is already serving a previous version of the same model instead of creating a new model server for every new
9092
model version. Every model server that the Model Deployer provisions externally to deploy a model is represented

docs/book/toc.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,7 @@
125125
* [MLflow](stacks-and-components/component-guide/model-deployers/mlflow.md)
126126
* [Seldon](stacks-and-components/component-guide/model-deployers/seldon.md)
127127
* [BentoML](stacks-and-components/component-guide/model-deployers/bentoml.md)
128+
* [Hugging Face](stacks-and-components/component-guide/model-deployers/huggingface.md)
128129
* [Develop a Custom Model Deployer](stacks-and-components/component-guide/model-deployers/custom.md)
129130
* [Step Operators](stacks-and-components/component-guide/step-operators/step-operators.md)
130131
* [Amazon SageMaker](stacks-and-components/component-guide/step-operators/sagemaker.md)

docs/mocked_libs.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,8 @@
106106
"great_expectations.types",
107107
"hvac",
108108
"hvac.exceptions",
109+
"huggingface_hub",
110+
"huggingface_hub.utils",
109111
"kfp",
110112
"kfp.compiler",
111113
"kfp.v2",

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -448,5 +448,6 @@ module = [
448448
"mlstacks.*",
449449
"matplotlib.*",
450450
"IPython.*",
451+
"huggingface_hub.*"
451452
]
452453
ignore_missing_imports = true

src/zenml/integrations/huggingface/__init__.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,14 @@
1212
# or implied. See the License for the specific language governing
1313
# permissions and limitations under the License.
1414
"""Initialization of the Huggingface integration."""
15+
from typing import List, Type
1516

1617
from zenml.integrations.constants import HUGGINGFACE
1718
from zenml.integrations.integration import Integration
19+
from zenml.stack import Flavor
20+
21+
HUGGINGFACE_MODEL_DEPLOYER_FLAVOR = "huggingface"
22+
HUGGINGFACE_SERVICE_ARTIFACT = "hf_deployment_service"
1823

1924

2025
class HuggingfaceIntegration(Integration):
@@ -31,6 +36,20 @@ class HuggingfaceIntegration(Integration):
3136
def activate(cls) -> None:
3237
"""Activates the integration."""
3338
from zenml.integrations.huggingface import materializers # noqa
39+
from zenml.integrations.huggingface import services
40+
41+
@classmethod
42+
def flavors(cls) -> List[Type[Flavor]]:
43+
"""Declare the stack component flavors for the Huggingface integration.
44+
45+
Returns:
46+
List of stack component flavors for this integration.
47+
"""
48+
from zenml.integrations.huggingface.flavors import (
49+
HuggingFaceModelDeployerFlavor,
50+
)
51+
52+
return [HuggingFaceModelDeployerFlavor]
3453

3554

3655
HuggingfaceIntegration.check_installation()
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Copyright (c) ZenML GmbH 2024. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at:
6+
#
7+
# https://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
12+
# or implied. See the License for the specific language governing
13+
# permissions and limitations under the License.
14+
"""Hugging Face integration flavors."""
15+
16+
from zenml.integrations.huggingface.flavors.huggingface_model_deployer_flavor import ( # noqa
17+
HuggingFaceModelDeployerConfig,
18+
HuggingFaceModelDeployerFlavor,
19+
HuggingFaceBaseConfig,
20+
)
21+
22+
__all__ = [
23+
"HuggingFaceModelDeployerConfig",
24+
"HuggingFaceModelDeployerFlavor",
25+
"HuggingFaceBaseConfig",
26+
]
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# Copyright (c) ZenML GmbH 2024. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at:
6+
#
7+
# https://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
12+
# or implied. See the License for the specific language governing
13+
# permissions and limitations under the License.
14+
"""Hugging Face model deployer flavor."""
15+
16+
from typing import TYPE_CHECKING, Any, Dict, Optional, Type
17+
18+
from pydantic import BaseModel
19+
20+
from zenml.integrations.huggingface import HUGGINGFACE_MODEL_DEPLOYER_FLAVOR
21+
from zenml.model_deployers.base_model_deployer import (
22+
BaseModelDeployerConfig,
23+
BaseModelDeployerFlavor,
24+
)
25+
from zenml.utils.secret_utils import SecretField
26+
27+
if TYPE_CHECKING:
28+
from zenml.integrations.huggingface.model_deployers.huggingface_model_deployer import (
29+
HuggingFaceModelDeployer,
30+
)
31+
32+
33+
class HuggingFaceBaseConfig(BaseModel):
34+
"""Hugging Face Inference Endpoint configuration."""
35+
36+
endpoint_name: str = "zenml-"
37+
repository: Optional[str] = None
38+
framework: Optional[str] = None
39+
accelerator: Optional[str] = None
40+
instance_size: Optional[str] = None
41+
instance_type: Optional[str] = None
42+
region: Optional[str] = None
43+
vendor: Optional[str] = None
44+
token: Optional[str] = None
45+
account_id: Optional[str] = None
46+
min_replica: int = 0
47+
max_replica: int = 1
48+
revision: Optional[str] = None
49+
task: Optional[str] = None
50+
custom_image: Optional[Dict[str, Any]] = None
51+
namespace: Optional[str] = None
52+
endpoint_type: str = "public"
53+
54+
55+
class HuggingFaceModelDeployerConfig(
56+
BaseModelDeployerConfig, HuggingFaceBaseConfig
57+
):
58+
"""Configuration for the Hugging Face model deployer.
59+
60+
Attributes:
61+
token: Hugging Face token used for authentication
62+
namespace: Hugging Face namespace used to list endpoints
63+
"""
64+
65+
token: str = SecretField()
66+
67+
# The namespace to list endpoints for. Set to `"*"` to list all endpoints
68+
# from all namespaces (i.e. personal namespace and all orgs the user belongs to).
69+
namespace: str
70+
71+
72+
class HuggingFaceModelDeployerFlavor(BaseModelDeployerFlavor):
73+
"""Hugging Face Endpoint model deployer flavor."""
74+
75+
@property
76+
def name(self) -> str:
77+
"""Name of the flavor.
78+
79+
Returns:
80+
The name of the flavor.
81+
"""
82+
return HUGGINGFACE_MODEL_DEPLOYER_FLAVOR
83+
84+
@property
85+
def docs_url(self) -> Optional[str]:
86+
"""A url to point at docs explaining this flavor.
87+
88+
Returns:
89+
A flavor docs url.
90+
"""
91+
return self.generate_default_docs_url()
92+
93+
@property
94+
def sdk_docs_url(self) -> Optional[str]:
95+
"""A url to point at SDK docs explaining this flavor.
96+
97+
Returns:
98+
A flavor SDK docs url.
99+
"""
100+
return self.generate_default_sdk_docs_url()
101+
102+
@property
103+
def logo_url(self) -> str:
104+
"""A url to represent the flavor in the dashboard.
105+
106+
Returns:
107+
The flavor logo.
108+
"""
109+
return "https://public-flavor-logos.s3.eu-central-1.amazonaws.com/model_registry/huggingface.png"
110+
111+
@property
112+
def config_class(self) -> Type[HuggingFaceModelDeployerConfig]:
113+
"""Returns `HuggingFaceModelDeployerConfig` config class.
114+
115+
Returns:
116+
The config class.
117+
"""
118+
return HuggingFaceModelDeployerConfig
119+
120+
@property
121+
def implementation_class(self) -> Type["HuggingFaceModelDeployer"]:
122+
"""Implementation class for this flavor.
123+
124+
Returns:
125+
The implementation class.
126+
"""
127+
from zenml.integrations.huggingface.model_deployers.huggingface_model_deployer import (
128+
HuggingFaceModelDeployer,
129+
)
130+
131+
return HuggingFaceModelDeployer

0 commit comments

Comments
 (0)