Commit 8679731
Add FuseConsecutiveRescalesPass to fuse redundant RESCALE pairs (#17830)
Summary:
Add `FuseConsecutiveRescalesPass` to eliminate redundant INT32→INT8→INT32 RESCALE round-trips between chained arithmetic ops in the TOSA lowering pipeline.
`InsertRescaleInt32Pass` wraps each add/sub/mul with input RESCALEs (INT8→INT32) and output RESCALEs (INT32→INT8). When ops are chained (e.g., add→add or mul→add), the output RESCALE of op1 feeds directly into the input RESCALE of op2, creating a wasteful round-trip. Each unnecessary RESCALE decomposes into Add+Mul NPU instructions (~1,130 cycles each on Ethos-U55-128), and in quantized models RESCALE overhead accounts for 25-50% of total NPU cycles.
The pass detects consecutive RESCALE pairs (R1: INT32→INT8/INT16, R2: INT8/INT16→INT32) and handles two cases:
- **Identity** (composed scale ≈ 1.0, matching zero points): Removes both RESCALEs and directly wires R1's input to R2's users. This eliminates the entire round-trip. Bypassing the intermediate INT8/INT16 clamp can cause up to ~120 INT8 steps of output difference, handled via `qtol=1` in tests.
- **Non-identity**: Leaves the pair unchanged. Creating a single INT32→INT32 RESCALE would be semantically correct (and the TOSA ref model handles it), but Vela's NPU compiler produces all-zero outputs for INT32→INT32 RESCALE. Root cause: `EthosU55Constraints::SupportsRescale()` returns `false` for dtypes > 16 bits, causing `RewriteRescale()` to convert the RESCALE into a MUL with aggressive right-shift that zeros out values.
Multi-user R1 nodes (e.g., residual connections, branching) are handled by fusing each R1→R2 pair individually while preserving R1 for non-RESCALE users.
## Context
This pass runs unconditionally in the TOSA pipeline immediately after `InsertRescaleInt32Pass` (see `arm_pass_manager.py`). Identity pairs are the most common case between chained ops with similar quantization scales, so this optimization still eliminates the majority of redundant RESCALEs.
The stacked diff D95243636 adds a follow-on `EliminateRescaleBeforeMulPass` that absorbs residual INT32→INT32 RESCALEs before MUL ops.
## Vela INT32→INT32 RESCALE Limitation (Follow-up)
The Vela NPU compiler (Ethos-U55) cannot handle INT32→INT32 RESCALE:
- `EthosU55Constraints::SupportsRescale()` rejects types > 16 bits
- `GraphIrOptimiser::RewriteRescale()` decomposes rejected RESCALEs into MUL ops with explicit OFM scaling
- For INT32→INT32, the MUL's right-shift (typically 20-40 bits) zeros out the result
- `EliminateTosaRescale()` only handles Conv/MatMul patterns, not standalone RESCALEs
- Python-side `rewrite_rescale()` has no code path for INT32→INT32 with non-Conv predecessor
A follow-up Vela patch can fix `RewriteRescale()` to properly handle INT32→INT32 RESCALE (likely after the INT16 conversion step). Once Vela is fixed, this pass can be updated to also fuse non-identity pairs.
## Numerical Analysis
| Source | Magnitude | Mitigation |
| ------ | --------- | ---------- |
| **A**: Fixed-point decomposition non-associativity | ~1 INT8 step | Handled via `qtol=1` in tests |
| **B**: INT8 clamping bypass on identity removal | up to 120 INT8 steps | Bounded; handled via `qtol=1` in tests |
Reviewed By: 3l1
Differential Revision: D944833311 parent 3604d3e commit 8679731
File tree
7 files changed
+1069
-19
lines changed- backends/arm
- _passes
- test
- models
- passes
7 files changed
+1069
-19
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
102 | 102 | | |
103 | 103 | | |
104 | 104 | | |
| 105 | + | |
105 | 106 | | |
106 | 107 | | |
107 | 108 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
98 | 98 | | |
99 | 99 | | |
100 | 100 | | |
| 101 | + | |
101 | 102 | | |
102 | 103 | | |
103 | 104 | | |
| |||
183 | 184 | | |
184 | 185 | | |
185 | 186 | | |
186 | | - | |
187 | | - | |
| 187 | + | |
188 | 188 | | |
189 | 189 | | |
190 | 190 | | |
| |||
213 | 213 | | |
214 | 214 | | |
215 | 215 | | |
216 | | - | |
217 | | - | |
218 | | - | |
| 216 | + | |
| 217 | + | |
219 | 218 | | |
220 | 219 | | |
221 | 220 | | |
| |||
245 | 244 | | |
246 | 245 | | |
247 | 246 | | |
248 | | - | |
249 | 247 | | |
250 | 248 | | |
251 | 249 | | |
| |||
260 | 258 | | |
261 | 259 | | |
262 | 260 | | |
263 | | - | |
264 | 261 | | |
265 | 262 | | |
266 | 263 | | |
| |||
273 | 270 | | |
274 | 271 | | |
275 | 272 | | |
276 | | - | |
277 | 273 | | |
278 | 274 | | |
279 | 275 | | |
| |||
317 | 313 | | |
318 | 314 | | |
319 | 315 | | |
320 | | - | |
321 | | - | |
| 316 | + | |
| 317 | + | |
322 | 318 | | |
323 | 319 | | |
324 | 320 | | |
325 | 321 | | |
326 | 322 | | |
327 | | - | |
328 | 323 | | |
329 | 324 | | |
330 | 325 | | |
| |||
380 | 375 | | |
381 | 376 | | |
382 | 377 | | |
| 378 | + | |
383 | 379 | | |
384 | 380 | | |
385 | 381 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
0 commit comments