Debug Methods #14390

MuseMamba · 2025-12-04T03:44:10Z

MuseMamba
Dec 4, 2025

Hello genius developers, how do you guys debug the inference framework during development?

Will you only care about the high-level metrics like: TTFT, TPOT, or the occupation of GPU or CPU?
Or do you rely on a framework like Nsight to optimize your kernel?
Do you need a framework to split the inference pipeline into different modules, equipped with low-intrusion observation and debugging methods? Like in a rough picture, using this framework, you can start from a checkpoint after the model is loaded and try different KV cache mechanisms with observation and debug functions.
Thank you for letting me know. I'm looking forward to meeting up on Sunday.