🐛 [Bug] Compilation failure for SSD300 model with dynamic batch #1555

gs-olive · 2022-12-16T02:33:54Z

Bug Description

When converting the SSD300 object detection network from TorchScript to Torch-TRT, the following error is encountered:

WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [graphShapeAnalyzer.cpp::analyzeShapes::1872] Error Code 4: Miscellaneous (IShuffleLayer %454 : Tensor = aten::reshape(%446, %451): reshape dimension with more than one -1 wildcard. Reshaping [(# 0 (SHAPE input_0)),16,38,38] to [-1,4,-1].)
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [graphShapeAnalyzer.cpp::analyzeShapes::1872] Error Code 4: Miscellaneous (IShuffleLayer %454 : Tensor = aten::reshape(%446, %451): reshape dimension with more than one -1 wildcard. Reshaping [(# 0 (SHAPE input_0)),16,38,38] to [-1,4,-1].)

The error arises from an input to aten::reshape which utilizes both the dynamic batch dimension, but also the aten::reshape wildcard -1. The fact that both dynamic batch and reshape wildcard use -1 is the cause of the bug.

This bug has also been demonstrated to impact some ResNet50 implementations with dynamic batch sizes.

Bug Source

The source of the error is this line:

TensorRT/core/conversion/converters/impl/shuffle.cpp

Line 90 in c63a5a5

shuffle->setReshapeDimensions(util::toDims(new_shape));

The input new_shape is [-1, 4, -1], implying the desired shape has 2 "implicit" dimensions, however this is not the case, as the first -1 indicates the batch dimension, while the second is an implicit dimension. Thus, the desired behavior is a reshape from:
$$[-1, 16, 38, 38] \Longrightarrow [-1, 4, 5776].$$

The necessary code modifications would be needed here:

TensorRT/core/conversion/converters/impl/shuffle.cpp

Lines 76 to 83 in c63a5a5

    
           for (size_t i = 0; i < new_shape.size(); i++) { 
        
             if (in_shape[i] == -1) 
        
               nbDynamicDims++; 
        
           } 
        
           if (nbDynamicDims > 1) { 
        
             TORCHTRT_THROW_ERROR( 
        
                 "Resize is currently not supported when target shape contains more than one dynamic dimension"); 
        
           }

A potential challenge here is determining which dimension in the reshape input dimensions corresponds to the batch dimension, and which corresponds to the implicit dimension.

Potential Resolution

Consider using a different value than -1 to represent dynamic dimensions, for example INT32_MIN, or some other value which cannot represent any reasonable shape in the original tensor.

To Reproduce

Steps to reproduce the behavior:

Run torch_tensorrt.compile with SSD300 model as input, using fp32 precision.
Choose dynamic input sizes: {"min": [1, 3, 300, 300], "opt": [16, 3, 300, 300], "max": [16, 3, 300, 300]} and enable truncate_long_and_double with 8 GB workspace.

Expected behavior

Model should successfully compile to Torch-TRT. Specifically, internal reshape dimensions with dynamic batch should resolve correctly.

Environment

Torch-TensorRT Version: 1.4.0.dev0+2ef6c3a5
PyTorch Version: 1.14.0.dev20221114+cu116
CPU Architecture: Intel Xeon CPU
OS: Ubuntu 20.04
How you installed PyTorch: pip
Build command you used: python setup.py develop
Are you using local sources or building from archives: local
Python version: 3.8.13
CUDA version: 11.6

The text was updated successfully, but these errors were encountered:

gs-olive · 2022-12-19T18:16:19Z

Updates

The same error on dynamic batch is not showing up when using the FX path for SSD300 or ResNet50. Notably, for both models, full compilation in TensorRT is supported in both TS and FX

github-actions · 2023-03-20T00:02:32Z

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

gs-olive · 2023-05-16T01:01:23Z

Fixed by #1851, when using allow_shape_tensors=True as a compilation argument.

gs-olive added the bug Something isn't working label Dec 16, 2022

gs-olive self-assigned this Dec 16, 2022

gs-olive mentioned this issue Dec 19, 2022

Create RFC for aten::size operator support with Dynamic Shape #1562

Closed

github-actions bot added the No Activity label Mar 20, 2023

gs-olive removed the No Activity label Mar 27, 2023

gs-olive closed this as completed May 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐛 [Bug] Compilation failure for SSD300 model with dynamic batch #1555

🐛 [Bug] Compilation failure for SSD300 model with dynamic batch #1555

gs-olive commented Dec 16, 2022 •

edited

Loading

gs-olive commented Dec 19, 2022

Uh oh!

github-actions bot commented Mar 20, 2023

Uh oh!

gs-olive commented May 16, 2023

Uh oh!

🐛 [Bug] Compilation failure for SSD300 model with dynamic batch #1555

🐛 [Bug] Compilation failure for SSD300 model with dynamic batch #1555

Comments

gs-olive commented Dec 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug Description

Bug Source

Potential Resolution

To Reproduce

Expected behavior

Environment

gs-olive commented Dec 19, 2022

Updates

Uh oh!

github-actions bot commented Mar 20, 2023

Uh oh!

gs-olive commented May 16, 2023

Uh oh!

gs-olive commented Dec 16, 2022 •

edited

Loading