Commit a5994ac
authored
[XNNPACK] Serialize weights as fp16 rather than fp32 (#9753)
### Summary
Previously we've used FP32_STATIC_WEIGHTS flag in xnnpack to coerce fp32
weights into fp16 for linear and conv. This allowed us to mimc fp16
computation because the weights would be converted and packed as fp16 at
runtime. However, this means we lose the benefit of the smaller .pte
file because the weights are serialized as fp32 rather than fp16.
Additionally, we still have to load the weights as fp32, since they are
converted at runtime. This has some poor effects on performance
### Test plan
```
python -m unittest backends.xnnpack.test.ops.test_linear.TestLinear.test_fp16_linear
python -m unittest backends.xnnpack.test.ops.test_linear.TestLinear
python -m unittest backends.xnnpack.test.ops.test_conv2d.TestConv2d
```
Llama 3.2 with bf16 weights:
Before:
```
-rw-r--r-- 1 maxren staff 5468937344 Mar 28 17:00 llama3_2_fp16_direct_convert_runtime.pte
```
After:
```
-rw-r--r-- 1 maxren staff 2997443712 Mar 28 16:57 llama3_2_fp16_direct_convert_runtime.pte
```1 parent 16e5901 commit a5994ac
File tree
4 files changed
+22
-21
lines changed- backends/xnnpack
- operators
- test/ops
4 files changed
+22
-21
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
210 | 210 | | |
211 | 211 | | |
212 | 212 | | |
213 | | - | |
| 213 | + | |
214 | 214 | | |
215 | 215 | | |
216 | 216 | | |
| |||
267 | 267 | | |
268 | 268 | | |
269 | 269 | | |
270 | | - | |
| 270 | + | |
271 | 271 | | |
272 | 272 | | |
273 | 273 | | |
| |||
348 | 348 | | |
349 | 349 | | |
350 | 350 | | |
351 | | - | |
| 351 | + | |
352 | 352 | | |
353 | 353 | | |
354 | 354 | | |
| |||
368 | 368 | | |
369 | 369 | | |
370 | 370 | | |
371 | | - | |
| 371 | + | |
372 | 372 | | |
373 | 373 | | |
374 | 374 | | |
| |||
405 | 405 | | |
406 | 406 | | |
407 | 407 | | |
408 | | - | |
| 408 | + | |
409 | 409 | | |
410 | 410 | | |
411 | 411 | | |
| |||
417 | 417 | | |
418 | 418 | | |
419 | 419 | | |
420 | | - | |
421 | | - | |
422 | | - | |
| 420 | + | |
423 | 421 | | |
424 | 422 | | |
425 | 423 | | |
| |||
504 | 502 | | |
505 | 503 | | |
506 | 504 | | |
507 | | - | |
| 505 | + | |
508 | 506 | | |
509 | 507 | | |
510 | 508 | | |
| |||
525 | 523 | | |
526 | 524 | | |
527 | 525 | | |
528 | | - | |
| 526 | + | |
529 | 527 | | |
530 | 528 | | |
531 | 529 | | |
| |||
554 | 552 | | |
555 | 553 | | |
556 | 554 | | |
557 | | - | |
| 555 | + | |
558 | 556 | | |
559 | 557 | | |
560 | 558 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
82 | 82 | | |
83 | 83 | | |
84 | 84 | | |
85 | | - | |
86 | 85 | | |
87 | 86 | | |
88 | 87 | | |
| |||
102 | 101 | | |
103 | 102 | | |
104 | 103 | | |
105 | | - | |
106 | 104 | | |
| 105 | + | |
107 | 106 | | |
108 | 107 | | |
109 | 108 | | |
| |||
127 | 126 | | |
128 | 127 | | |
129 | 128 | | |
| 129 | + | |
130 | 130 | | |
131 | 131 | | |
132 | 132 | | |
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
136 | | - | |
| 136 | + | |
137 | 137 | | |
138 | 138 | | |
139 | 139 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
62 | | - | |
63 | 62 | | |
64 | 63 | | |
65 | 64 | | |
| |||
69 | 68 | | |
70 | 69 | | |
71 | 70 | | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
72 | 77 | | |
73 | 78 | | |
74 | 79 | | |
75 | 80 | | |
76 | 81 | | |
77 | | - | |
| 82 | + | |
78 | 83 | | |
79 | 84 | | |
80 | 85 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
605 | 605 | | |
606 | 606 | | |
607 | 607 | | |
608 | | - | |
609 | | - | |
610 | | - | |
| 608 | + | |
611 | 609 | | |
612 | 610 | | |
613 | 611 | | |
| |||
624 | 622 | | |
625 | 623 | | |
626 | 624 | | |
627 | | - | |
| 625 | + | |
628 | 626 | | |
629 | 627 | | |
630 | 628 | | |
| |||
717 | 715 | | |
718 | 716 | | |
719 | 717 | | |
720 | | - | |
| 718 | + | |
721 | 719 | | |
722 | 720 | | |
723 | 721 | | |
| |||
0 commit comments