Description
A clear and concise description of what the bug is.
I am following the tutorial online: https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/stable/examples/getting-started-session-based/03-serving-session-based-model-torch-backend.ipynb
After creating the model "executor_model", I tried to run the Triton Inference Server with
docker run --gpus=1 --rm --net=host -v /home/***/workspace/data/models:/models nvcr.io/nvidia/tritonserver:24.03-py3 tritonserver --model-repository=/models
=============================
== Triton Inference Server ==
=============================
NVIDIA Release 24.03 (build 86102629)
Triton Server Version 2.44.0
Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
I0425 00:38:39.764967 1 pinned_memory_manager.cc:275] Pinned memory pool is created at '0x205000000' with size 268435456
I0425 00:38:39.765025 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0425 00:38:39.770824 1 model_lifecycle.cc:469] loading: 1_predictpytorchtriton:1
I0425 00:38:39.770870 1 model_lifecycle.cc:469] loading: executor_model:1
I0425 00:38:39.770891 1 model_lifecycle.cc:469] loading: 0_transformworkflowtriton:1
I0425 00:38:40.340567 1 libtorch.cc:2467] TRITONBACKEND_Initialize: pytorch
I0425 00:38:40.340593 1 libtorch.cc:2477] Triton TRITONBACKEND API version: 1.19
I0425 00:38:40.340603 1 libtorch.cc:2483] 'pytorch' TRITONBACKEND API version: 1.19
I0425 00:38:40.340635 1 libtorch.cc:2516] TRITONBACKEND_ModelInitialize: 1_predictpytorchtriton (version 1)
W0425 00:38:40.342571 1 libtorch.cc:318] skipping model configuration auto-complete for '1_predictpytorchtriton': not supported for pytorch backend
I0425 00:38:40.343279 1 libtorch.cc:347] Optimized execution is enabled for model instance '1_predictpytorchtriton'
I0425 00:38:40.343301 1 libtorch.cc:366] Cache Cleaning is disabled for model instance '1_predictpytorchtriton'
I0425 00:38:40.343304 1 libtorch.cc:383] Inference Mode is enabled for model instance '1_predictpytorchtriton'
I0425 00:38:40.343350 1 libtorch.cc:2560] TRITONBACKEND_ModelInstanceInitialize: 1_predictpytorchtriton_0_0 (GPU device 0)
I0425 00:38:40.431502 1 model_lifecycle.cc:835] successfully loaded '1_predictpytorchtriton'
I0425 00:38:40.653469 157 pb_stub.cc:290] I0425 00:38:40.653469 156 pb_stub.cc:290] Failed to initialize Python stub for auto-complete: ModuleNotFoundError: No module named 'nvtabular'
At:
/models/0_transformworkflowtriton/1/model.py(32): <module>
<frozen importlib._bootstrap>(241): _call_with_frames_removed
<frozen importlib._bootstrap_external>(883): exec_module
<frozen importlib._bootstrap>(703): _load_unlocked
<frozen importlib._bootstrap>(1006): _find_and_load_unlocked
<frozen importlib._bootstrap>(1027): _find_and_load
Failed to initialize Python stub for auto-complete: ModuleNotFoundError: No module named 'merlin'
At:
/models/executor_model/1/model.py(31): <module>
<frozen importlib._bootstrap>(241): _call_with_frames_removed
<frozen importlib._bootstrap_external>(883): exec_module
<frozen importlib._bootstrap>(703): _load_unlocked
<frozen importlib._bootstrap>(1006): _find_and_load_unlocked
<frozen importlib._bootstrap>(1027): _find_and_load
E0425 00:38:40.655660 1 model_lifecycle.cc:638] failed to load 'executor_model' version 1: Internal: ModuleNotFoundError: No module named 'merlin'
At:
/models/executor_model/1/model.py(31): <module>
<frozen importlib._bootstrap>(241): _call_with_frames_removed
<frozen importlib._bootstrap_external>(883): exec_module
<frozen importlib._bootstrap>(703): _load_unlocked
<frozen importlib._bootstrap>(1006): _find_and_load_unlocked
<frozen importlib._bootstrap>(1027): _find_and_load
E0425 00:38:40.655687 1 model_lifecycle.cc:638] failed to load '0_transformworkflowtriton' version 1: Internal: ModuleNotFoundError: No module named 'nvtabular'
At:
/models/0_transformworkflowtriton/1/model.py(32): <module>
<frozen importlib._bootstrap>(241): _call_with_frames_removed
<frozen importlib._bootstrap_external>(883): exec_module
<frozen importlib._bootstrap>(703): _load_unlocked
<frozen importlib._bootstrap>(1006): _find_and_load_unlocked
<frozen importlib._bootstrap>(1027): _find_and_load
I0425 00:38:40.655692 1 model_lifecycle.cc:773] failed to load 'executor_model'
I0425 00:38:40.655723 1 model_lifecycle.cc:773] failed to load '0_transformworkflowtriton'
I0425 00:38:40.655779 1 server.cc:607]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0425 00:38:40.655820 1 server.cc:634]
+---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config
|
+---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0425 00:38:40.655868 1 server.cc:677]
+---------------------------+---------+-------------------------------------------------------------------------+
| Model | Version | Status |
+---------------------------+---------+-------------------------------------------------------------------------+
| 0_transformworkflowtriton | 1 | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'nvtabular' |
| | | |
| | | At: |
| | | /models/0_transformworkflowtriton/1/model.py(32): <module> |
| | | <frozen importlib._bootstrap>(241): _call_with_frames_removed |
| | | <frozen importlib._bootstrap_external>(883): exec_module |
| | | <frozen importlib._bootstrap>(703): _load_unlocked |
| | | <frozen importlib._bootstrap>(1006): _find_and_load_unlocked |
| | | <frozen importlib._bootstrap>(1027): _find_and_load |
| 1_predictpytorchtriton | 1 | READY |
| executor_model | 1 | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'merlin' |
| | | |
| | | At: |
| | | /models/executor_model/1/model.py(31): <module> |
| | | <frozen importlib._bootstrap>(241): _call_with_frames_removed |
| | | <frozen importlib._bootstrap_external>(883): exec_module |
| | | <frozen importlib._bootstrap>(703): _load_unlocked |
| | | <frozen importlib._bootstrap>(1006): _find_and_load_unlocked |
| | | <frozen importlib._bootstrap>(1027): _find_and_load |
+---------------------------+---------+-------------------------------------------------------------------------+
I0425 00:38:40.679675 1 metrics.cc:877] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3070 Ti Laptop GPU
I0425 00:38:40.689920 1 metrics.cc:770] Collecting CPU metrics
I0425 00:38:40.690196 1 tritonserver.cc:2538]
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value
|
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton
|
| server_version | 2.44.0
|
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | /models
|
| model_control_mode | MODE_NONE
|
| strict_model_config | 0
|
| rate_limit | OFF
|
| pinned_memory_pool_byte_size | 268435456
|
| cuda_memory_pool_byte_size{0} | 67108864
|
| min_supported_compute_capability | 6.0
|
| strict_readiness | 1
|
| exit_timeout | 30
|
| cache_enabled | 0
|
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0425 00:38:40.690214 1 server.cc:307] Waiting for in-flight requests to complete.
I0425 00:38:40.690219 1 server.cc:323] Timeout 30: Found 0 model versions that have in-flight inferences
I0425 00:38:40.690350 1 server.cc:338] All models are stopped, unloading models
I0425 00:38:40.690365 1 server.cc:347] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0425 00:38:40.690430 1 libtorch.cc:2594] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0425 00:38:40.691390 1 libtorch.cc:2539] TRITONBACKEND_ModelFinalize: delete model state
I0425 00:38:40.691732 1 model_lifecycle.cc:620] successfully unloaded '1_predictpytorchtriton' version 1
I0425 00:38:41.690694 1 server.cc:347] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
W0425 00:38:41.693252 1 metrics.cc:631] Unable to get power limit for GPU 0. Status:Success, value:0.000000
error: creating server: Internal - failed to load all models
W0425 00:38:42.701892 1 metrics.cc:631] Unable to get power limit for GPU 0. Status:Success, value:0.000000
Triton Information
What version of Triton are you using?
tritonserver:24.03-py3
Are you using the Triton container or did you build it yourself?
docker nvcr.io/nvidia/tritonserver:24.03-py3
To Reproduce
Steps to reproduce the behavior.
I followed this online tutorial : https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/stable/examples/getting-started-session-based/03-serving-session-based-model-torch-backend.ipynb
Expected behavior
The server should reply to client with the following message:
<HTTPSocketPoolResponse status=200 headers={'content-type': 'application/json', 'content-length': '188'}>
bytearray(b'[{"name":"0_transformworkflowtriton","version":"1","state":"READY"},{"name":"1_predictpytorchtriton","version":"1","state":"READY"},{"name":"executor_model","version":"1","state":"READY"}]')
Description
A clear and concise description of what the bug is.
I am following the tutorial online: https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/stable/examples/getting-started-session-based/03-serving-session-based-model-torch-backend.ipynb
After creating the model "executor_model", I tried to run the Triton Inference Server with
Triton Information
What version of Triton are you using?
tritonserver:24.03-py3
Are you using the Triton container or did you build it yourself?
docker nvcr.io/nvidia/tritonserver:24.03-py3
To Reproduce
Steps to reproduce the behavior.
I followed this online tutorial : https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/stable/examples/getting-started-session-based/03-serving-session-based-model-torch-backend.ipynb
Expected behavior
The server should reply to client with the following message:
<HTTPSocketPoolResponse status=200 headers={'content-type': 'application/json', 'content-length': '188'}>
bytearray(b'[{"name":"0_transformworkflowtriton","version":"1","state":"READY"},{"name":"1_predictpytorchtriton","version":"1","state":"READY"},{"name":"executor_model","version":"1","state":"READY"}]')