Skip to content

TensorrtExecutionProvider documentation #1395

@IlyasMoutawwakil

Description

@IlyasMoutawwakil

System Info

main, docs

Who can help?

@fxmarty

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

The method described in the docs for TRT engine building is outdated, first mentioned here, I tested the dynamic shapes method in optimum-benchmark here.

Expected behavior

We can update the docs with this snippet:

provider_options = {
    "trt_engine_cache_enable": True,
    "trt_engine_cache_path": "tmp/trt_cache_gpt2_example",
    "trt_profile_min_shapes": "input_ids:1x16,attention_mask:1x16",
    "trt_profile_max_shapes": "input_ids:1x64,attention_mask:1x64",
    "trt_profile_opt_shapes": "input_ids:1x32,attention_mask:1x32",
}

ort_model = ORTModelForCausalLM.from_pretrained(
    "gpt2",
    export=True,
    use_cache=False,
    provider="TensorrtExecutionProvider",
    provider_options=provider_options,
)

ort_model.generate(
    input_ids=torch.tensor([[1] * 16]).to("cuda"),
    max_new_tokens=64-16,
    min_new_tokens=64-16,
    pad_token_id=0,
    eos_token_id=0,
)

though it's still not clear to me what's the effect of trt_profile_opt_shapes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationonnxruntimeRelated to ONNX Runtime

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions