Description
Is your feature request related to a problem? Please describe.
%28 : int = aten::size(%22, %27)
...
%31 : Tensor[] = tensorrt::execute_engine(%30, %trt_engine_0x55cc735470f0)
%32 : Tensor, %33 : Tensor = prim::ListUnpack(%31)
%34 : int = prim::Constant[value=15]()
%35 : int = prim::Constant[value=7]()
%36 : int = prim::Constant[value=0]()
%37 : int[] = prim::ListConstruct(%28, %34, %34, %35)
...
%41 : Tensor[] = tensorrt::execute_engine(%40, %trt_engine_0x55cc73547290)
%bar : Tensor = aten::reshape(%foo, %37)
In the snippet below we have a reshape node %bar which is forced to fallback by min_block_size. Because %bar is forced to fallback its input %37 = prim::ListConstruct is also forced to fallback along with %28 : int = aten::size. In both cases these ops are supported by evaluators and (at least with static shapes) should be resolved to constants during conversion. Currently these ops are creating unnecessary breaks between TRT regions that will impact performance.
Describe the solution you'd like
Could evaluatable nodes be resolved to constants in the torchscript graph before partitioning to avoid this impact on the partition?
Describe alternatives you've considered
Could fallback nodes with no dependencies on active TRT nodes be consolidated to avoid breaking TRT regions? (In this case %37 is not used anywhere other than in the reshape node %bar and could be moved to just before its use)