Skip to content

Commit b8db91e

Browse files
authored
TensorRT 10.13 OSS Release (#4531)
Signed-off-by: Kevin Chen <[email protected]>
1 parent 9161ba7 commit b8db91e

File tree

188 files changed

+2743
-1047
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

188 files changed

+2743
-1047
lines changed

.github/workflows/label_issue.yml

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,11 @@ name: Label New Issues
22

33
on:
44
issues:
5-
types: [opened]
5+
types: [opened]
66

77
permissions:
8-
issues: write
9-
contents: read
8+
issues: write
9+
contents: read
1010

1111
jobs:
1212
label-issue:
@@ -21,7 +21,7 @@ jobs:
2121
ref: v1.2.1
2222

2323
- name: AI Label Issue
24-
uses: ./.github/actions/goggles_action/actions/llm_label
24+
uses: ./.github/actions/goggles_action/actions/llm_label
2525
with:
2626
ACTION_TOKEN: ${{ secrets.GITHUB_TOKEN }}
2727
LLM_MODEL_NAME: ${{ secrets.LLM_MODEL_NAME }}
@@ -39,9 +39,9 @@ jobs:
3939
ACTIONS_STEP_VERBOSE: false
4040
EXCLUDED_LABELS: "Investigating,internal-bug-tracked,stale,triaged,wontfix"
4141
LLM_SYSTEM_PROMPT: |
42-
You are an expert GitHub issue labeler. Your task is to analyze the provided issue title, issue body, and a list of available labels with their descriptions.
43-
Based on this information, select the single most appropriate label from the list that best captures the primary issue or request.
44-
Prefer selecting only one label that represents the main topic or problem. Only suggest multiple labels if the issue genuinely spans multiple distinct areas that are equally important.
45-
Respond with ONLY the chosen label name (e.g., 'bug', 'feature-request') or comma-separated names if multiple are truly needed.
46-
If no labels seem appropriate, respond with 'NONE'.
47-
Do not add any other text, explanation, or markdown formatting.
42+
You are an expert GitHub issue labeler. Your task is to analyze the provided issue title, issue body, and a list of available labels with their descriptions.
43+
Based on this information, select the single most appropriate label from the list that best captures the primary issue or request.
44+
Prefer selecting only one label that represents the main topic or problem. Only suggest multiple labels if the issue genuinely spans multiple distinct areas that are equally important.
45+
Respond with ONLY the chosen label name (e.g., 'bug', 'feature-request') or comma-separated names if multiple are truly needed.
46+
If no labels seem appropriate, respond with 'NONE'.
47+
Do not add any other text, explanation, or markdown formatting.

CHANGELOG.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,19 @@
11
# TensorRT OSS Release Changelog
22

3+
## 10.13.0 GA - 2025-7-24
4+
- Plugin changes
5+
- Fixed a division-by-zero error in geluPlugin that occured when the bias is omitted.
6+
- Completed transition away from using static plugin field/attribute member variables in standard plugins. There's no such need since presently, TRT does not access field information after plugin creators are destructed (deregistered from the plugin registry), nor does access such information without a creator instance.
7+
- Sample changes
8+
- Deprecated the `yolov3_onnx` sample due to unstable url of yolo weights.
9+
- Updated the `1_run_onnx_with_tensorrt` and `2_construct_network_with_layer_apis` samples to use `cuda-python` instead of `PyCUDA` for latest GPU/CUDA support.
10+
- Parser changes
11+
- Decreased memory usage when importing models with external weights
12+
- Added `loadModelProto`, `loadInitializer` and `parseModelProto` APIs for IParser. These APIs are meant to be used to load user initializers when parsing ONNX models.
13+
- Added `loadModelProto`, `loadInitializer` and `refitModelProto` APIs for IParserRefitter. These APIs are meant to be used to load user initializers when refitting ONNX models.
14+
- Deprecated `IParser::parseWithWeightDescriptors`.
15+
16+
317
## 10.12.0 GA - 2025-6-10
418
- Plugin changes
519
- Migrated `IPluginV2`-descendent version 1 of `cropAndResizeDynamic`, to version 2, which implements `IPluginV3`.
@@ -30,7 +44,7 @@
3044
- Added [Image-to-Image](demo/Diffusion#generate-an-image-with-stable-diffusion-v35-large-with-controlnet-guided-by-an-image-and-a-text-prompt) support for Stable Diffusion v3.5-large ControlNet models.
3145
- Enabled download of [pre-exported ONNX models](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-tensorrt) for the Stable Diffusion v3.5-large pipeline.
3246
- Sample changes
33-
- Added two refactored python samples [1_run_onnx_with_tensorrt](samples/python/refactored/1_run_onnx_with_tensorrt) and [2_construct_network_with_layer_apis](samples/python/refactored/2_construct_network_with_layer_apis)
47+
- Added two refactored python samples [1_run_onnx_with_tensorrt](samples/python/refactored/1_run_onnx_with_tensorrt) and [2_construct_network_with_layer_apis](samples/python/refactored/2_construct_network_with_layer_apis)
3448
- Parser changes
3549
- Added support for integer-typed base tensors for `Pow` operations
3650
- Added support for custom `MXFP8` quantization operations

README.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ To build the TensorRT-OSS components, you will first need the following software
3232

3333
**TensorRT GA build**
3434

35-
- TensorRT v10.12.0.36
35+
- TensorRT v10.13.0.35
3636
- Available from direct download links listed below
3737

3838
**System Packages**
@@ -86,24 +86,24 @@ To build the TensorRT-OSS components, you will first need the following software
8686

8787
Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com) with the direct links below:
8888

89-
- [TensorRT 10.12.0.36 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.12.0/tars/TensorRT-10.12.0.36.Linux.x86_64-gnu.cuda-11.8.tar.gz)
90-
- [TensorRT 10.12.0.36 for CUDA 12.9, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.12.0/tars/TensorRT-10.12.0.36.Linux.x86_64-gnu.cuda-12.9.tar.gz)
91-
- [TensorRT 10.12.0.36 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.12.0/zip/TensorRT-10.12.0.36.Windows.win10.cuda-11.8.zip)
92-
- [TensorRT 10.12.0.36 for CUDA 12.9, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.12.0/zip/TensorRT-10.12.0.36.Windows.win10.cuda-12.9.zip)
89+
- [TensorRT 10.13.0.35 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.13.0/tars/TensorRT-10.13.0.35.Linux.x86_64-gnu.cuda-11.8.tar.gz)
90+
- [TensorRT 10.13.0.35 for CUDA 12.9, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.13.0/tars/TensorRT-10.13.0.35.Linux.x86_64-gnu.cuda-12.9.tar.gz)
91+
- [TensorRT 10.13.0.35 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.13.0/zip/TensorRT-10.13.0.35.Windows.win10.cuda-11.8.zip)
92+
- [TensorRT 10.13.0.35 for CUDA 12.9, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.13.0/zip/TensorRT-10.13.0.35.Windows.win10.cuda-12.9.zip)
9393

9494
**Example: Ubuntu 20.04 on x86-64 with cuda-12.9**
9595

9696
```bash
9797
cd ~/Downloads
98-
tar -xvzf TensorRT-10.12.0.36.Linux.x86_64-gnu.cuda-12.9.tar.gz
99-
export TRT_LIBPATH=`pwd`/TensorRT-10.12.0.36
98+
tar -xvzf TensorRT-10.13.0.35.Linux.x86_64-gnu.cuda-12.9.tar.gz
99+
export TRT_LIBPATH=`pwd`/TensorRT-10.13.0.35
100100
```
101101

102102
**Example: Windows on x86-64 with cuda-12.9**
103103

104104
```powershell
105-
Expand-Archive -Path TensorRT-10.12.0.36.Windows.win10.cuda-12.9.zip
106-
$env:TRT_LIBPATH="$pwd\TensorRT-10.12.0.36\lib"
105+
Expand-Archive -Path TensorRT-10.13.0.35.Windows.win10.cuda-12.9.zip
106+
$env:TRT_LIBPATH="$pwd\TensorRT-10.13.0.35\lib"
107107
```
108108

109109
## Setting Up The Build Environment

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
10.12.0.36
1+
10.13.0.35

cmake/modules/InstallUtils.cmake

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
include_guard()
17+
include(GNUInstallDirs)
18+
19+
# Install one or more targets including PDB files on Windows/MSVC
20+
# Usage:
21+
# installLibraries(
22+
# TARGETS target1 [target2 ...]
23+
# [COMPONENT component] # Optional component name for packaging
24+
# [CONFIGURATIONS config1 [config2 ...]] # Optional configurations to install
25+
# )
26+
function(installLibraries)
27+
cmake_parse_arguments(
28+
ARG # Prefix for parsed args
29+
"OPTIONAL" # Options (flags)
30+
"COMPONENT" # Single value args
31+
"TARGETS;CONFIGURATIONS" # Multi-value args
32+
${ARGN}
33+
)
34+
35+
# Validate required arguments
36+
if(NOT ARG_TARGETS)
37+
message(FATAL_ERROR "installLibrary() requires TARGETS argument")
38+
endif()
39+
40+
# Prepare optional arguments for regular install command
41+
if(ARG_COMPONENT)
42+
set(component_arg COMPONENT ${ARG_COMPONENT})
43+
endif()
44+
45+
if(ARG_CONFIGURATIONS)
46+
set(config_arg CONFIGURATIONS ${ARG_CONFIGURATIONS})
47+
endif()
48+
49+
if(ARG_OPTIONAL)
50+
set(optional_arg OPTIONAL)
51+
endif()
52+
53+
# Install the libraries
54+
install(
55+
TARGETS ${ARG_TARGETS}
56+
${optional_arg}
57+
${component_arg}
58+
${config_arg}
59+
)
60+
61+
# Install PDB files for MSVC builds
62+
if(MSVC)
63+
foreach(target ${ARG_TARGETS})
64+
# Get target type (SHARED_LIBRARY, STATIC_LIBRARY, EXECUTABLE)
65+
get_target_property(target_type ${target} TYPE)
66+
67+
# For shared libraries and executables, PDBs are placed alongside the binaries
68+
if(target_type STREQUAL "SHARED_LIBRARY" OR target_type STREQUAL "EXECUTABLE")
69+
# Use generator expression to get the PDB file path
70+
install(
71+
FILES "$<TARGET_PDB_FILE:${target}>"
72+
DESTINATION ${CMAKE_INSTALL_BINDIR}
73+
${component_arg}
74+
CONFIGURATIONS Debug RelWithDebInfo
75+
OPTIONAL
76+
)
77+
endif()
78+
endforeach()
79+
endif()
80+
endfunction()

demo/Diffusion/README.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ This demo application ("demoDiffusion") showcases the acceleration of Stable Dif
77
### Clone the TensorRT OSS repository
88

99
```bash
10-
git clone [email protected]:NVIDIA/TensorRT.git -b release/10.11 --single-branch
10+
git clone [email protected]:NVIDIA/TensorRT.git -b release/10.13 --single-branch
1111
cd TensorRT
1212
```
1313

@@ -49,7 +49,7 @@ onnx 1.15.0
4949
onnx-graphsurgeon 0.5.2
5050
onnxruntime 1.16.3
5151
polygraphy 0.49.9
52-
tensorrt 10.12.0.36
52+
tensorrt 10.13.0.35
5353
tokenizers 0.13.3
5454
torch 2.2.0
5555
transformers 4.42.2
@@ -460,7 +460,3 @@ Custom override paths to pre-built engine files can be provided using `--custom-
460460
- To accelerate engine building time use `--timing-cache <path to cache file>`. The cache file will be created if it does not already exist. Note that performance may degrade if cache files are used across multiple GPU targets. It is recommended to use timing caches only during development. To achieve the best perfromance in deployment, please build engines without timing cache.
461461
- Specify new directories for storing onnx and engine files when switching between versions, LoRAs, ControlNets, etc. This can be done using `--onnx-dir <new onnx dir>` and `--engine-dir <new engine dir>`.
462462
- Inference performance can be improved by enabling [CUDA graphs](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-graphs) using `--use-cuda-graph`. Enabling CUDA graphs requires fixed input shapes, so this flag must be combined with `--build-static-batch` and cannot be combined with `--build-dynamic-shape`.
463-
464-
### Known Issues
465-
466-
The Stable Diffusion XL pipeline is currently not supported on RTX 5090 due to memory constraints. This issue will be resolved in an upcoming release.

demo/Diffusion/demo_diffusion/dd_argparse.py

Lines changed: 30 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -310,27 +310,36 @@ def process_pipeline_args(args: argparse.Namespace) -> Tuple[Dict[str, Any], Dic
310310
# int8 support
311311
if args.int8 and not any(args.version.startswith(prefix) for prefix in ("xl", "1.4", "1.5", "2.1")):
312312
raise ValueError("int8 quantization is only supported for SDXL, SD1.4, SD1.5 and SD2.1 pipelines.")
313-
314-
# fp8 support
315-
if args.fp8 and not (
316-
any(args.version.startswith(prefix) for prefix in ("xl", "1.4", "1.5", "2.1", "3.5-large")) or is_flux
317-
):
318-
raise ValueError(
319-
"fp8 quantization is only supported for SDXL, SD1.4, SD1.5, SD2.1, SD3.5-large and FLUX pipelines."
320-
)
321-
322-
if args.fp8 and hasattr(args, "controlnet_type"):
323-
if args.version != "xl-1.0":
313+
314+
# fp8 support validation
315+
if args.fp8:
316+
# Check version compatibility
317+
supported_versions = ("xl", "1.4", "1.5", "2.1", "3.5-large")
318+
if not (any(args.version.startswith(prefix) for prefix in supported_versions) or is_flux):
319+
raise ValueError(
320+
"fp8 quantization is only supported for SDXL, SD1.4, SD1.5, SD2.1, SD3.5-large and FLUX pipelines."
321+
)
322+
323+
# Check controlnet compatibility
324+
if hasattr(args, "controlnet_type") and args.version != "xl-1.0":
324325
raise ValueError("fp8 controlnet quantization is only supported for SDXL.")
325326

326-
if args.fp8 and args.int8:
327-
raise ValueError("Cannot apply both int8 and fp8 quantization, please choose only one.")
328-
329-
if args.fp8 and sm_version < 89:
330-
raise ValueError(
331-
f"Cannot apply FP8 quantization for GPU with compute capability {sm_version / 10.0}. Only Ada and Hopper are supported."
332-
)
333-
327+
# Check for conflicting quantization
328+
if args.int8:
329+
raise ValueError("Cannot apply both int8 and fp8 quantization, please choose only one.")
330+
331+
# Check GPU compute capability
332+
if sm_version < 89:
333+
raise ValueError(
334+
f"Cannot apply FP8 quantization for GPU with compute capability {sm_version / 10.0}. A minimum compute capability of 8.9 is required."
335+
)
336+
337+
# Check SD3.5-large specific requirement
338+
if args.version == "3.5-large" and not args.download_onnx_models:
339+
raise ValueError(
340+
"Native FP8 quantization is not supported for SD3.5-large. Please pass --download-onnx-models."
341+
)
342+
334343
# TensorRT ModelOpt quantization level
335344
if args.quantization_level == 0.0:
336345
def override_quant_level(level: float, dtype_str: str):
@@ -352,8 +361,8 @@ def override_quant_level(level: float, dtype_str: str):
352361
"Transformer ONNX model for Quantization level 3 is not available for download. Please export the quantized Transformer model natively with the removal of --download-onnx-models."
353362
)
354363
if args.fp4:
355-
# FP4 precision is only supported for Flux Pipelines
356-
assert is_flux, "FP4 precision is only supported for Flux pipelines"
364+
# FP4 precision is only supported for the Flux pipeline
365+
assert is_flux, "FP4 precision is only supported for the Flux pipeline"
357366

358367
# Handle LoRA
359368
# FLUX canny and depth official LoRAs are not supported because they modify the transformer architecture, conflicting with refit

demo/Diffusion/demo_diffusion/pipeline/stable_diffusion_35_pipeline.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -279,7 +279,7 @@ def _initialize_models(self, framework_model_dir, int8, fp8, fp4):
279279
"fp16": self.fp16,
280280
"tf32": self.tf32,
281281
"text_maxlen": self.models["t5"].text_maxlen + self.models["clip_g"].text_maxlen,
282-
"build_strongly_typed": False,
282+
"build_strongly_typed": not self.controlnets,
283283
"weight_streaming": self.weight_streaming,
284284
"do_classifier_free_guidance": self.do_classifier_free_guidance,
285285
}

demo/Diffusion/demo_diffusion/pipeline/stable_diffusion_pipeline.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
# See the License for the specific language governing permissions and
1515
# limitations under the License.
1616
#
17+
import gc
1718
import inspect
1819
import json
1920
import os
@@ -48,10 +49,10 @@
4849
CLIPModel,
4950
CLIPWithProjModel,
5051
SDLoraLoader,
52+
UNet2DConditionControlNetModel,
5153
UNetModel,
5254
UNetXLModel,
5355
UNetXLModelControlNet,
54-
UNet2DConditionControlNetModel,
5556
VAEEncoderModel,
5657
VAEModel,
5758
get_clip_embedding_dim,
@@ -641,6 +642,10 @@ def forward_loop(model):
641642
if torch_fallback[model_name]:
642643
self.torch_models[model_name] = obj.get_model(torch_inference=self.torch_inference)
643644

645+
# Release temp GPU memory during onnx export to avoid OOM.
646+
gc.collect()
647+
torch.cuda.empty_cache()
648+
644649
def calculateMaxDeviceMemory(self):
645650
max_device_memory = 0
646651
for model_name, engine in self.engine.items():

demo/Diffusion/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ftfy
99
matplotlib
1010
nvtx
1111
onnx==1.17.0
12-
onnxconverter-common
12+
onnxconverter-common==1.14.0
1313
onnxruntime==1.19.2
1414
opencv-python-headless==4.8.0.74
1515
scipy

0 commit comments

Comments
 (0)