Closed
Description
Is your feature request related to a problem? Please describe.
Caching right now is weight agnostic, but at the cost of creating lower performance engines.
Describe the solution you'd like
If we know that weights would be identical, then we can cache engines that are higher performance. The caching system would need to be able to distinguish these two caches and based on user settings select the right one
TensorRT has a flag called kREFIT_IDENTICAL for this workflow
Describe alternatives you've considered
Additional context