PUMA aims to be a lightweight, high-performance inference engine for local AI. Play for fun.
Run make build to build the puma binary.
Run ./puma help to see all available commands.
For example, you can run ./puma version to see the binary version.
Use llama.cpp as the default backend for quick prototyping, will implement our own backend in the future.