-
Notifications
You must be signed in to change notification settings - Fork 883
Cannot load model #1813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I tried version 1.11, but I got more error ;(
|
inference.py post here again just in case. `
class InvertedResBlock(nn.Module):
class Generator(nn.Module):
def model_fn(model_dir):
def transform_fn(model, request_body, request_content_type='image/', response_content_type='image/'):
` |
@LiJell For 1.9 it is not clear from the logs why its failing to load the model, I wonder if there is any further pointer in the log traces to show the exact point it fails. I am guessing it can be some path issue, are you following this doc and your model artifact lives in a s3 bucket? it seems with 1.11 that is failing on importing nvgpu, "packages/ts/metrics/system_metrics.py", line 61, in gpu_utilization import nvgpu". |
Thank you for your reply! I already have tried install nvgpu and import it, but I will try again since I might have a mistake. Actually, I uploaded model.pt file on jupyter notebook and converted into tar.gz format in notebook where belongs to s3 bucket, but I will follow the doc and upload as it had to. Thank you again!! |
Hi, HamidShojanazeri, there was an improvement after restructure model.tar.gz file maybe there was a mistake. By the way, nvgpu error still comes out depending on framework version. Thank you!! |
@HamidShojanazeri Hi, HamidShojanazeri. Model loads well now!!! But, I still have a problem with inference.py, so I think i will be fine if I fix it in a right way. Thank you for ur help :) @HamidShojanazeri @agunapal |
I'm seeing this same error:
Building with:
I tried adding nvgpu to the imports in my requirments.txt file, but that doesn't seem to make a difference. I'm not sure what metrics the error is referring to. |
I've also tried with the framework version 1.10, had the same error. |
@lxning I also still have this error with framework version 1.9. Any advice on a work around or what might be missing? |
I am a beginner, but I would like to share my experience. I still do not get why nvpgu error occurred, but @HamidShojanazeri said its docker container issue. Even if I installed nvgpu and import it just as like you did, Cloud Watch said "there is no module name nvgpu" until framework version fit on mine. cheers! |
@LiJell Torchserve doesn't own Sagemaker docker container. Please file a ticket to AWS Sagemaker if you are using Torchserve via Sagemaker. |
Okay!! Thank you for your help!! I will ask this question on right place. |
🐛 Describe the bug
I am trying to deploy locally pretrained model via sagemaker to make a endpoint and use it.
I deployed a model
from sagemaker.pytorch import PyTorchModel
pytorch_model = PyTorchModel(model_data='model.tar.gz',
role=role,
entry_point='inference.py',
framework_version="1.9.0",
py_version="py38")
predictor = pytorch_model.deploy(instance_type='ml.g4dn.xlarge', initial_instance_count=1)
and predict data
from PIL import Image
data = Image.open('./samples/inputs/1.jpg')
result = predictor.predict(data)
img = Image.open(result)
img.show()
as a result I got an error
ModelError Traceback (most recent call last)
/tmp/ipykernel_4268/3704626012.py in <cell line: 4>()
2
3 data = Image.open('./samples/inputs/1.jpg')
----> 4 result = predictor.predict(data)
5
6 img = Image.open(result)
~/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/sagemaker/predictor.py in predict(self, data, initial_args, target_model, target_variant, inference_id)
159 data, initial_args, target_model, target_variant, inference_id
160 )
--> 161 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
162 return self._handle_response(response)
163
~/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
506 )
507 # The "self" in this scope is referring to the BaseClient.
--> 508 return self._make_api_call(operation_name, kwargs)
509
510 _api_call.name = str(py_operation_name)
~/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
913 error_code = parsed_response.get("Error", {}).get("Code")
914 error_class = self.exceptions.from_code(error_code)
--> 915 raise error_class(parsed_response, operation_name)
916 else:
917 return parsed_response
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from primary with message "Your invocation timed out while waiting for a response from container primary. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.".
I skim through logs via CloudWatch, and still struggling with this. need a help.
Error logs
Installation instructions
I am using sagemaker
Model Packaing
`from sagemaker.pytorch import PyTorchModel
pytorch_model = PyTorchModel(model_data='model.tar.gz',
role=role,
entry_point='inference.py',
framework_version="1.9.0",
py_version="py38")`
config.properties
No response
Versions
framework_version="1.9.0",
py_version="py38"
Torchserve version: 0.4.2
working on conda_pytorch_p38 sagemaker notebook instance
Repro instructions
inference file that I wrote
class ConvNormLReLU(nn.Sequential):
def init(self, in_ch, out_ch, kernel_size=3, stride=1, padding=1, pad_mode="reflect", groups=1, bias=False):
class InvertedResBlock(nn.Module):
def init(self, in_ch, out_ch, expansion_ratio=2):
super(InvertedResBlock, self).init()
class Generator(nn.Module):
def init(self, ):
super().init()
def model_fn(model_dir):
"""Load the model and return it.
Providing this function is optional.
There is a default_model_fn available, which will load the model
compiled using SageMaker Neo. You can override the default here.
The model_fn only needs to be defined if your model needs extra
steps to load, and can otherwise be left undefined.
def transform_fn(model, request_body, request_content_type='image/', response_content_type='image/'):
image_format = "png" #@param ["jpeg", "png"]
"""Run prediction and return the output.
The function
1. Pre-processes the input request
2. Runs prediction
3. Post-processes the prediction output.
"""
# preprocess
img_in = Image.open(io.BytesIO(request_body)).convert("RGB")
Possible Solution
No response
The text was updated successfully, but these errors were encountered: