Skip to content

Serve TensorRT or torch2trt model #1243

@pallashadow

Description

@pallashadow

TensorRT can decrease the latency dramatically on some model, especially when batchsize=1.

torch2trt is a PyTorch to TensorRT converter which utilizes the TensorRT Python API. It can simple convert the model to tensorRT in 1 line of code, and run it with Pytorch input/output. see https://github.com/NVIDIA-AI-IOT/torch2trt.

I am wondering if

  1. Is there any risk to serve a tensorrt or torch2trt model by torchserve?
  2. Will it be an official support for serving tensorRT model?

Describe the solution

It seems that torchserve can serve torch2trt model pretty well, simply by rewriting the handler like this.

from torch2trt import TRTModule

class Yolov5FaceHandler(BaseHandler):
    def initialize(self, context):
        serialized_file = context.manifest["model"]["serializedFile"]
        if serialized_file.split(".")[-1] == "torch2trt": #if serializedFile ends with .torch2trt instead of .pt
            self._load_torchscript_model = self._load_torch2trt_model # overwrite load model function
        self.super().initializer(context)

    def _load_torch2trt_model(self, torch2trt_path):
        logger.info("Loading torch2trt model")
        model_trt = TRTModule()
        model_trt.load_state_dict(torch.load(torch2trt_path))
        return model_trt

Describe alternatives solution

Maybe this feature can be add to ts/torch_handler/base_handler.py?
Or there would be a new exemplar handler for it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions