CUDA: fix crash on large batch size for MoE models #22298
| Job | Run time |
|---|---|
| 7m 54s | |
| 3m 6s | |
| 2m 33s | |
| 4m 9s | |
| 2m 48s | |
| 2m 45s | |
| 2m 16s | |
| 5m 28s | |
| 1m 21s | |
| 45m 15s | |
| 9m 41s | |
| 3m 28s | |
| 13m 33s | |
| 14m 16s | |
| 1m 8s | |
| 4m 39s | |
| 1m 17s | |
| 2m 59s | |
| 2m 50s | |
| 3m 28s | |
| 18m 23s | |
| 11m 19s | |
| 5m 54s | |
| 11m 19s | |
| 2m 27s | |
| 1m 56s | |
| 7m 2s | |
| 3m 21s | |
| 11m 15s | |
| 3m 15s | |
| 8m 12s | |
| 4m 3s | |
| 5m 20s | |
| 7m 26s | |
| 15m 5s | |
| 5m 36s | |
| 4m 55s | |
| -1s | |
| 4h 21m 41s |