You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Pull Request resolved: #2210
# context
* the new op `permute_multi_embedding` outperforms the original op `permute_pooled_embs_auto_grad`
* this diff makes the move to switch to the new op
* benchmark results: D58907223
# benchmark
* [traces](https://drive.google.com/drive/folders/1v_kD9n1jOkGUmYyix3-dUYiBDE_C3Hiv?usp=drive_link)
* previous prod
{F1747994738}
* new prod
{F1747994032}
* metrics
|Operator|GPU runtime|GPU memory|notes|
|---|---|---|---|---|
|**[previous prod] permute_pooled_embs**|4.9 ms|1.5 K|GPU-boudned, does **NOT** allow duplicates, PT2 non-compatible `pin_and_move`|
|**[new prod] permute_multi_embedding**|2.0 ms|1.0 K|both CPU and GPU runtime/memory improved, **ALLOW** duplicates, PT2 friendly|
Reviewed By: dstaay-fb
Differential Revision: D53590566
0 commit comments