You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello genius developers, how do you guys debug the inference framework during development?
Will you only care about the high-level metrics like: TTFT, TPOT, or the occupation of GPU or CPU?
Or do you rely on a framework like Nsight to optimize your kernel?
Do you need a framework to split the inference pipeline into different modules, equipped with low-intrusion observation and debugging methods? Like in a rough picture, using this framework, you can start from a checkpoint after the model is loaded and try different KV cache mechanisms with observation and debug functions.
Thank you for letting me know. I'm looking forward to meeting up on Sunday.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hello genius developers, how do you guys debug the inference framework during development?
Thank you for letting me know. I'm looking forward to meeting up on Sunday.
Beta Was this translation helpful? Give feedback.
All reactions