Serve TensorRT or torch2trt model

TensorRT can decrease the latency dramatically on some model, especially when batchsize=1.  

torch2trt is a PyTorch to TensorRT converter which utilizes the TensorRT Python API. It can simple convert the model to tensorRT in 1 line of code, and run it with Pytorch input/output. see https://github.com/NVIDIA-AI-IOT/torch2trt.    

I am wondering if  
1. Is there any risk to serve a tensorrt or torch2trt model by torchserve? 
2. Will it be an official support for serving tensorRT model? 

## Describe the solution

It seems that torchserve can serve torch2trt model pretty well, simply by rewriting the handler like this. 

```
from torch2trt import TRTModule

class Yolov5FaceHandler(BaseHandler):
    def initialize(self, context):
        serialized_file = context.manifest["model"]["serializedFile"]
        if serialized_file.split(".")[-1] == "torch2trt": #if serializedFile ends with .torch2trt instead of .pt
            self._load_torchscript_model = self._load_torch2trt_model # overwrite load model function
        self.super().initializer(context)

    def _load_torch2trt_model(self, torch2trt_path):
        logger.info("Loading torch2trt model")
        model_trt = TRTModule()
        model_trt.load_state_dict(torch.load(torch2trt_path))
        return model_trt
```

## Describe alternatives solution

Maybe this feature can be add to ts/torch_handler/base_handler.py? 
Or there would be a new exemplar handler for it.   

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Serve TensorRT or torch2trt model #1243

Describe the solution

Describe alternatives solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Serve TensorRT or torch2trt model #1243

Description

Describe the solution

Describe alternatives solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions