NVIDIA
diff --git a/‎.github/workflows/label_issue.yml‎
Lines changed: 10 additions & 10 deletions b/‎.github/workflows/label_issue.yml‎
Lines changed: 10 additions & 10 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 15 additions & 1 deletion b/‎CHANGELOG.md‎
Lines changed: 15 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 9 additions & 9 deletions b/‎README.md‎
Lines changed: 9 additions & 9 deletions
diff --git a/‎VERSION‎
Lines changed: 1 addition & 1 deletion b/‎VERSION‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎cmake/modules/InstallUtils.cmake‎
Lines changed: 80 additions & 0 deletions b/‎cmake/modules/InstallUtils.cmake‎
Lines changed: 80 additions & 0 deletions
diff --git a/‎demo/Diffusion/README.md‎
Lines changed: 2 additions & 6 deletions b/‎demo/Diffusion/README.md‎
Lines changed: 2 additions & 6 deletions
diff --git a/‎demo/Diffusion/demo_diffusion/dd_argparse.py‎
Lines changed: 30 additions & 21 deletions b/‎demo/Diffusion/demo_diffusion/dd_argparse.py‎
Lines changed: 30 additions & 21 deletions
diff --git a/‎demo/Diffusion/demo_diffusion/pipeline/stable_diffusion_35_pipeline.py‎
Lines changed: 1 addition & 1 deletion b/‎demo/Diffusion/demo_diffusion/pipeline/stable_diffusion_35_pipeline.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎demo/Diffusion/demo_diffusion/pipeline/stable_diffusion_pipeline.py‎
Lines changed: 6 additions & 1 deletion b/‎demo/Diffusion/demo_diffusion/pipeline/stable_diffusion_pipeline.py‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎demo/Diffusion/requirements.txt‎
Lines changed: 1 addition & 1 deletion b/‎demo/Diffusion/requirements.txt‎
Lines changed: 1 addition & 1 deletion
@@ -2,11 +2,11 @@ name: Label New Issues
 
 on:
   issues:
-    types: [opened] 
+    types: [opened]
 
 permissions:
-  issues: write 
-  contents: read 
+  issues: write
+  contents: read
 
 jobs:
   label-issue:
@@ -21,7 +21,7 @@ jobs:
           ref: v1.2.1
 
       - name: AI Label Issue
-        uses: ./.github/actions/goggles_action/actions/llm_label 
+        uses: ./.github/actions/goggles_action/actions/llm_label
         with:
           ACTION_TOKEN: ${{ secrets.GITHUB_TOKEN }}
           LLM_MODEL_NAME: ${{ secrets.LLM_MODEL_NAME }}
@@ -39,9 +39,9 @@ jobs:
           ACTIONS_STEP_VERBOSE: false
           EXCLUDED_LABELS: "Investigating,internal-bug-tracked,stale,triaged,wontfix"
           LLM_SYSTEM_PROMPT: |
-            You are an expert GitHub issue labeler. Your task is to analyze the provided issue title, issue body, and a list of available labels with their descriptions. 
-            Based on this information, select the single most appropriate label from the list that best captures the primary issue or request. 
-            Prefer selecting only one label that represents the main topic or problem. Only suggest multiple labels if the issue genuinely spans multiple distinct areas that are equally important. 
-            Respond with ONLY the chosen label name (e.g., 'bug', 'feature-request') or comma-separated names if multiple are truly needed. 
-            If no labels seem appropriate, respond with 'NONE'. 
-            Do not add any other text, explanation, or markdown formatting. 
+            You are an expert GitHub issue labeler. Your task is to analyze the provided issue title, issue body, and a list of available labels with their descriptions.
+            Based on this information, select the single most appropriate label from the list that best captures the primary issue or request.
+            Prefer selecting only one label that represents the main topic or problem. Only suggest multiple labels if the issue genuinely spans multiple distinct areas that are equally important.
+            Respond with ONLY the chosen label name (e.g., 'bug', 'feature-request') or comma-separated names if multiple are truly needed.
+            If no labels seem appropriate, respond with 'NONE'.
+            Do not add any other text, explanation, or markdown formatting.
@@ -1,5 +1,19 @@
 # TensorRT OSS Release Changelog
 
+## 10.13.0 GA - 2025-7-24
+- Plugin changes
+  - Fixed a division-by-zero error in geluPlugin that occured when the bias is omitted.
+  - Completed transition away from using static plugin field/attribute member variables in standard plugins. There's no such need since presently, TRT does not access field information after plugin creators are destructed (deregistered from the plugin registry), nor does access such information without a creator instance.
+- Sample changes
+  - Deprecated the `yolov3_onnx` sample due to unstable url of yolo weights.
+  - Updated the `1_run_onnx_with_tensorrt` and `2_construct_network_with_layer_apis` samples to use `cuda-python` instead of `PyCUDA` for latest GPU/CUDA support.
+- Parser changes
+  - Decreased memory usage when importing models with external weights
+  - Added `loadModelProto`, `loadInitializer` and `parseModelProto` APIs for IParser. These APIs are meant to be used to load user initializers when parsing ONNX models.
+  - Added `loadModelProto`, `loadInitializer` and `refitModelProto` APIs for IParserRefitter. These APIs are meant to be used to load user initializers when refitting ONNX models.
+  - Deprecated `IParser::parseWithWeightDescriptors`.
+
+
 ## 10.12.0 GA - 2025-6-10
 - Plugin changes
   - Migrated `IPluginV2`-descendent version 1 of `cropAndResizeDynamic`, to version 2, which implements `IPluginV3`.
@@ -30,7 +44,7 @@
   - Added [Image-to-Image](demo/Diffusion#generate-an-image-with-stable-diffusion-v35-large-with-controlnet-guided-by-an-image-and-a-text-prompt) support for Stable Diffusion v3.5-large ControlNet models.
   - Enabled download of [pre-exported ONNX models](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-tensorrt) for the Stable Diffusion v3.5-large pipeline.
 - Sample changes
-  - Added two refactored python samples [1_run_onnx_with_tensorrt](samples/python/refactored/1_run_onnx_with_tensorrt) and [2_construct_network_with_layer_apis](samples/python/refactored/2_construct_network_with_layer_apis) 
+  - Added two refactored python samples [1_run_onnx_with_tensorrt](samples/python/refactored/1_run_onnx_with_tensorrt) and [2_construct_network_with_layer_apis](samples/python/refactored/2_construct_network_with_layer_apis)
 - Parser changes
   - Added support for integer-typed base tensors for `Pow` operations
   - Added support for custom `MXFP8` quantization operations
 
@@ -32,7 +32,7 @@ To build the TensorRT-OSS components, you will first need the following software
 
 **TensorRT GA build**
 
-- TensorRT v10.12.0.36
+- TensorRT v10.13.0.35
   - Available from direct download links listed below
 
 **System Packages**
@@ -86,24 +86,24 @@ To build the TensorRT-OSS components, you will first need the following software
 
    Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com) with the direct links below:
 
-   - [TensorRT 10.12.0.36 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.12.0/tars/TensorRT-10.12.0.36.Linux.x86_64-gnu.cuda-11.8.tar.gz)
-   - [TensorRT 10.12.0.36 for CUDA 12.9, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.12.0/tars/TensorRT-10.12.0.36.Linux.x86_64-gnu.cuda-12.9.tar.gz)
-   - [TensorRT 10.12.0.36 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.12.0/zip/TensorRT-10.12.0.36.Windows.win10.cuda-11.8.zip)
-   - [TensorRT 10.12.0.36 for CUDA 12.9, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.12.0/zip/TensorRT-10.12.0.36.Windows.win10.cuda-12.9.zip)
+   - [TensorRT 10.13.0.35 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.13.0/tars/TensorRT-10.13.0.35.Linux.x86_64-gnu.cuda-11.8.tar.gz)
+   - [TensorRT 10.13.0.35 for CUDA 12.9, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.13.0/tars/TensorRT-10.13.0.35.Linux.x86_64-gnu.cuda-12.9.tar.gz)
+   - [TensorRT 10.13.0.35 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.13.0/zip/TensorRT-10.13.0.35.Windows.win10.cuda-11.8.zip)
+   - [TensorRT 10.13.0.35 for CUDA 12.9, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.13.0/zip/TensorRT-10.13.0.35.Windows.win10.cuda-12.9.zip)
 
    **Example: Ubuntu 20.04 on x86-64 with cuda-12.9**
 
    ```bash
    cd ~/Downloads
-   tar -xvzf TensorRT-10.12.0.36.Linux.x86_64-gnu.cuda-12.9.tar.gz
-   export TRT_LIBPATH=`pwd`/TensorRT-10.12.0.36
+   tar -xvzf TensorRT-10.13.0.35.Linux.x86_64-gnu.cuda-12.9.tar.gz
+   export TRT_LIBPATH=`pwd`/TensorRT-10.13.0.35
    ```
 
    **Example: Windows on x86-64 with cuda-12.9**
 
    ```powershell
-   Expand-Archive -Path TensorRT-10.12.0.36.Windows.win10.cuda-12.9.zip
-   $env:TRT_LIBPATH="$pwd\TensorRT-10.12.0.36\lib"
+   Expand-Archive -Path TensorRT-10.13.0.35.Windows.win10.cuda-12.9.zip
+   $env:TRT_LIBPATH="$pwd\TensorRT-10.13.0.35\lib"
    ```
 
 ## Setting Up The Build Environment
 
@@ -1 +1 @@
-10.12.0.36
+10.13.0.35
@@ -0,0 +1,80 @@
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+include_guard()
+include(GNUInstallDirs)
+
+# Install one or more targets including PDB files on Windows/MSVC
+# Usage:
+#   installLibraries(
+#     TARGETS target1 [target2 ...]
+#     [COMPONENT component]       # Optional component name for packaging
+#     [CONFIGURATIONS config1 [config2 ...]]  # Optional configurations to install
+#   )
+function(installLibraries)
+    cmake_parse_arguments(
+        ARG                       # Prefix for parsed args
+        "OPTIONAL"                # Options (flags)
+        "COMPONENT"               # Single value args
+        "TARGETS;CONFIGURATIONS"  # Multi-value args
+        ${ARGN}
+    )
+
+    # Validate required arguments
+    if(NOT ARG_TARGETS)
+        message(FATAL_ERROR "installLibrary() requires TARGETS argument")
+    endif()
+
+    # Prepare optional arguments for regular install command
+    if(ARG_COMPONENT)
+        set(component_arg COMPONENT ${ARG_COMPONENT})
+    endif()
+    
+    if(ARG_CONFIGURATIONS)
+        set(config_arg CONFIGURATIONS ${ARG_CONFIGURATIONS})
+    endif()
+
+    if(ARG_OPTIONAL)
+        set(optional_arg OPTIONAL)
+    endif()
+
+    # Install the libraries
+    install(
+        TARGETS ${ARG_TARGETS}
+        ${optional_arg}
+        ${component_arg}
+        ${config_arg}
+    )
+
+    # Install PDB files for MSVC builds
+    if(MSVC)
+        foreach(target ${ARG_TARGETS})
+            # Get target type (SHARED_LIBRARY, STATIC_LIBRARY, EXECUTABLE)
+            get_target_property(target_type ${target} TYPE)
+
+            # For shared libraries and executables, PDBs are placed alongside the binaries
+            if(target_type STREQUAL "SHARED_LIBRARY" OR target_type STREQUAL "EXECUTABLE")
+                # Use generator expression to get the PDB file path
+                install(
+                    FILES "$<TARGET_PDB_FILE:${target}>"
+                    DESTINATION ${CMAKE_INSTALL_BINDIR}
+                    ${component_arg}
+                    CONFIGURATIONS Debug RelWithDebInfo
+                    OPTIONAL
+                )
+            endif()
+        endforeach()
+    endif()
+endfunction()
@@ -7,7 +7,7 @@ This demo application ("demoDiffusion") showcases the acceleration of Stable Dif
 ### Clone the TensorRT OSS repository
 
 ```bash
-git clone [email protected]:NVIDIA/TensorRT.git -b release/10.11 --single-branch
+git clone [email protected]:NVIDIA/TensorRT.git -b release/10.13 --single-branch
 cd TensorRT
 ```
 
@@ -49,7 +49,7 @@ onnx                1.15.0
 onnx-graphsurgeon   0.5.2
 onnxruntime         1.16.3
 polygraphy          0.49.9
-tensorrt            10.12.0.36
+tensorrt            10.13.0.35
 tokenizers          0.13.3
 torch               2.2.0
 transformers        4.42.2
@@ -460,7 +460,3 @@ Custom override paths to pre-built engine files can be provided using `--custom-
 - To accelerate engine building time use `--timing-cache <path to cache file>`. The cache file will be created if it does not already exist. Note that performance may degrade if cache files are used across multiple GPU targets. It is recommended to use timing caches only during development. To achieve the best perfromance in deployment, please build engines without timing cache.
 - Specify new directories for storing onnx and engine files when switching between versions, LoRAs, ControlNets, etc. This can be done using `--onnx-dir <new onnx dir>` and `--engine-dir <new engine dir>`.
 - Inference performance can be improved by enabling [CUDA graphs](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-graphs) using `--use-cuda-graph`. Enabling CUDA graphs requires fixed input shapes, so this flag must be combined with `--build-static-batch` and cannot be combined with `--build-dynamic-shape`.
-
-### Known Issues
-
-The Stable Diffusion XL pipeline is currently not supported on RTX 5090 due to memory constraints. This issue will be resolved in an upcoming release.
@@ -310,27 +310,36 @@ def process_pipeline_args(args: argparse.Namespace) -> Tuple[Dict[str, Any], Dic
     # int8 support
     if args.int8 and not any(args.version.startswith(prefix) for prefix in ("xl", "1.4", "1.5", "2.1")):
         raise ValueError("int8 quantization is only supported for SDXL, SD1.4, SD1.5 and SD2.1 pipelines.")
-
-    # fp8 support
-    if args.fp8 and not (
-        any(args.version.startswith(prefix) for prefix in ("xl", "1.4", "1.5", "2.1", "3.5-large")) or is_flux
-    ):
-        raise ValueError(
-            "fp8 quantization is only supported for SDXL, SD1.4, SD1.5, SD2.1, SD3.5-large and FLUX pipelines."
-        )
-
-    if args.fp8 and hasattr(args, "controlnet_type"):
-        if args.version != "xl-1.0":
+    
+    # fp8 support validation
+    if args.fp8:
+        # Check version compatibility
+        supported_versions = ("xl", "1.4", "1.5", "2.1", "3.5-large")
+        if not (any(args.version.startswith(prefix) for prefix in supported_versions) or is_flux):
+            raise ValueError(
+                "fp8 quantization is only supported for SDXL, SD1.4, SD1.5, SD2.1, SD3.5-large and FLUX pipelines."
+            )
+
+        # Check controlnet compatibility
+        if hasattr(args, "controlnet_type") and args.version != "xl-1.0":
             raise ValueError("fp8 controlnet quantization is only supported for SDXL.")
 
-    if args.fp8 and args.int8:
-        raise ValueError("Cannot apply both int8 and fp8 quantization, please choose only one.")
-
-    if args.fp8 and sm_version < 89:
-        raise ValueError(
-            f"Cannot apply FP8 quantization for GPU with compute capability {sm_version / 10.0}.  Only Ada and Hopper are supported."
-        )
-
+        # Check for conflicting quantization
+        if args.int8:
+            raise ValueError("Cannot apply both int8 and fp8 quantization, please choose only one.")
+
+        # Check GPU compute capability
+        if sm_version < 89:
+            raise ValueError(
+                f"Cannot apply FP8 quantization for GPU with compute capability {sm_version / 10.0}. A minimum compute capability of 8.9 is required."
+            )
+
+        # Check SD3.5-large specific requirement
+        if args.version == "3.5-large" and not args.download_onnx_models:
+            raise ValueError(
+                "Native FP8 quantization is not supported for SD3.5-large. Please pass --download-onnx-models."
+            )
+        
     # TensorRT ModelOpt quantization level
     if args.quantization_level == 0.0:
         def override_quant_level(level: float, dtype_str: str):
@@ -352,8 +361,8 @@ def override_quant_level(level: float, dtype_str: str):
             "Transformer ONNX model for Quantization level 3 is not available for download. Please export the quantized Transformer model natively with the removal of --download-onnx-models."
         )
     if args.fp4:
-        # FP4 precision is only supported for Flux Pipelines
-        assert is_flux, "FP4 precision is only supported for Flux pipelines"
+        # FP4 precision is only supported for the Flux pipeline
+        assert is_flux, "FP4 precision is only supported for the Flux pipeline"
 
     # Handle LoRA
     # FLUX canny and depth official LoRAs are not supported because they modify the transformer architecture, conflicting with refit
 
@@ -279,7 +279,7 @@ def _initialize_models(self, framework_model_dir, int8, fp8, fp4):
                 "fp16": self.fp16,
                 "tf32": self.tf32,
                 "text_maxlen": self.models["t5"].text_maxlen + self.models["clip_g"].text_maxlen,
-                "build_strongly_typed": False,
+                "build_strongly_typed": not self.controlnets,
                 "weight_streaming": self.weight_streaming,
                 "do_classifier_free_guidance": self.do_classifier_free_guidance,
             }
 
@@ -14,6 +14,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
+import gc
 import inspect
 import json
 import os
@@ -48,10 +49,10 @@
     CLIPModel,
     CLIPWithProjModel,
     SDLoraLoader,
+    UNet2DConditionControlNetModel,
     UNetModel,
     UNetXLModel,
     UNetXLModelControlNet,
-    UNet2DConditionControlNetModel,
     VAEEncoderModel,
     VAEModel,
     get_clip_embedding_dim,
@@ -641,6 +642,10 @@ def forward_loop(model):
             if torch_fallback[model_name]:
                 self.torch_models[model_name] = obj.get_model(torch_inference=self.torch_inference)
 
+        # Release temp GPU memory during onnx export to avoid OOM.
+        gc.collect()
+        torch.cuda.empty_cache()
+
     def calculateMaxDeviceMemory(self):
         max_device_memory = 0
         for model_name, engine in self.engine.items():
 
@@ -9,7 +9,7 @@ ftfy
 matplotlib
 nvtx
 onnx==1.17.0
-onnxconverter-common
+onnxconverter-common==1.14.0
 onnxruntime==1.19.2
 opencv-python-headless==4.8.0.74
 scipy
Original file line number	Diff line number	Diff line change
`@@ -279,7 +279,7 @@ def _initialize_models(self, framework_model_dir, int8, fp8, fp4):`
`279`	`279`	`"fp16": self.fp16,`
`280`	`280`	`"tf32": self.tf32,`
`281`	`281`	`"text_maxlen": self.models["t5"].text_maxlen + self.models["clip_g"].text_maxlen,`
`282`		`- "build_strongly_typed": False,`
	`282`	`+ "build_strongly_typed": not self.controlnets,`
`283`	`283`	`"weight_streaming": self.weight_streaming,`
`284`	`284`	`"do_classifier_free_guidance": self.do_classifier_free_guidance,`
`285`	`285`	`}`