A high-performance Go binding for llama.cpp using purego for cross-platform compatibility without CGO.
- Pure Go: No CGO required, uses purego for C interop
- Cross-Platform: Supports macOS (CPU/Metal), Linux (CPU/NVIDIA/AMD), Windows (CPU/NVIDIA/AMD)
- Performance: Direct bindings to llama.cpp shared libraries
- Compatibility: Version-synchronized with llama.cpp releases
- Easy Integration: Simple Go API for LLM inference
- GPU Acceleration: Supports Metal, CUDA, HIP, Vulkan, OpenCL, SYCL, and other backends
Gollama.cpp uses a platform-specific architecture with build tags to ensure optimal compatibility and performance across all operating systems.
- CPU: Intel x64, Apple Silicon (ARM64)
- GPU: Metal (Apple Silicon)
- Status: Full feature support with purego
- Build Tags: Uses
!windows
build tag
- CPU: x86_64, ARM64
- GPU: NVIDIA (CUDA/Vulkan), AMD (HIP/ROCm/Vulkan), Intel (SYCL/Vulkan)
- Status: Full feature support with purego
- Build Tags: Uses
!windows
build tag
- CPU: x86_64, ARM64
- GPU: NVIDIA (CUDA/Vulkan), AMD (HIP/Vulkan), Intel (SYCL/Vulkan), Qualcomm Adreno (OpenCL) - planned
- Status: Build compatibility implemented, runtime support in development
- Build Tags: Uses
windows
build tag with syscall-based library loading - Current State:
- ✅ Compiles without errors on Windows
- ✅ Cross-compilation from other platforms works
- 🚧 Runtime functionality being implemented
- 🚧 GPU acceleration being added
Our platform abstraction layer uses Go build tags to provide:
- Unix-like systems (
!windows
): Uses purego for dynamic library loading - Windows (
windows
): Uses native Windows syscalls (LoadLibraryW
,FreeLibrary
,GetProcAddress
) - Cross-compilation: Supports building for any platform from any platform
- Automatic detection: Runtime platform capability detection
go get github.com/dianlight/gollama.cpp
The Go module automatically downloads pre-built llama.cpp libraries from the official ggml-org/llama.cpp releases on first use. No manual compilation required!
Our CI system tests compilation across all platforms:
Target Platform | Build From Linux | Build From macOS | Build From Windows |
---|---|---|---|
Linux (amd64) | ✅ | ✅ | ✅ |
Linux (arm64) | ✅ | ✅ | ✅ |
macOS (amd64) | ✅ | ✅ | ✅ |
macOS (arm64) | ✅ | ✅ | ✅ |
Windows (amd64) | ✅ | ✅ | ✅ |
Windows (arm64) | ✅ | ✅ | ✅ |
# Test cross-compilation for all platforms
make test-cross-compile
# Build for specific platform
GOOS=windows GOARCH=amd64 go build ./...
GOOS=linux GOARCH=arm64 go build ./...
GOOS=darwin GOARCH=arm64 go build ./...
# Run platform-specific tests
go test -v -run TestPlatformSpecific ./...
package main
import (
"fmt"
"log"
"github.com/dianlight/gollama.cpp"
)
func main() {
// Initialize the library
gollama.Backend_init()
defer gollama.Backend_free()
// Load model
params := gollama.Model_default_params()
model, err := gollama.Model_load_from_file("path/to/model.gguf", params)
if err != nil {
log.Fatal(err)
}
defer gollama.Model_free(model)
// Create context
ctxParams := gollama.Context_default_params()
ctx, err := gollama.Init_from_model(model, ctxParams)
if err != nil {
log.Fatal(err)
}
defer gollama.Free(ctx)
// Tokenize and generate
prompt := "The future of AI is"
tokens, err := gollama.Tokenize(model, prompt, true, false)
if err != nil {
log.Fatal(err)
}
// Create batch and decode
batch := gollama.Batch_init(len(tokens), 0, 1)
defer gollama.Batch_free(batch)
for i, token := range tokens {
gollama.Batch_add(batch, token, int32(i), []int32{0}, false)
}
if err := gollama.Decode(ctx, batch); err != nil {
log.Fatal(err)
}
// Sample next token
logits := gollama.Get_logits_ith(ctx, -1)
candidates := gollama.Token_data_array_init(model)
sampler := gollama.Sampler_init_greedy()
defer gollama.Sampler_free(sampler)
newToken := gollama.Sampler_sample(sampler, ctx, candidates)
// Convert token to text
text := gollama.Token_to_piece(model, newToken, false)
fmt.Printf("Generated: %s\n", text)
}
Gollama.cpp automatically downloads the appropriate pre-built binaries with GPU support and configures the optimal backend:
// Automatic GPU detection and configuration
params := gollama.Context_default_params()
params.n_gpu_layers = 32 // Offload layers to GPU (if available)
// Detect available GPU backend
backend := gollama.DetectGpuBackend()
fmt.Printf("Using GPU backend: %s\n", backend.String())
// Platform-specific optimizations:
// - macOS: Uses Metal when available
// - Linux: Supports CUDA, HIP, Vulkan, and SYCL
// - Windows: Supports CUDA, HIP, Vulkan, OpenCL, and SYCL
params.split_mode = gollama.LLAMA_SPLIT_MODE_LAYER
Platform | GPU Type | Backend | Status |
---|---|---|---|
macOS | Apple Silicon | Metal | ✅ Supported |
macOS | Intel/AMD | CPU only | ✅ Supported |
Linux | NVIDIA | CUDA | ✅ Available in releases |
Linux | NVIDIA | Vulkan | ✅ Available in releases |
Linux | AMD | HIP/ROCm | ✅ Available in releases |
Linux | AMD | Vulkan | ✅ Available in releases |
Linux | Intel | SYCL | ✅ Available in releases |
Linux | Intel/Other | Vulkan | ✅ Available in releases |
Linux | Intel/Other | CPU | ✅ Fallback |
Windows | NVIDIA | CUDA | ✅ Available in releases |
Windows | NVIDIA | Vulkan | ✅ Available in releases |
Windows | AMD | HIP | ✅ Available in releases |
Windows | AMD | Vulkan | ✅ Available in releases |
Windows | Intel | SYCL | ✅ Available in releases |
Windows | Qualcomm Adreno | OpenCL | ✅ Available in releases |
Windows | Intel/Other | Vulkan | ✅ Available in releases |
Windows | Intel/Other | CPU | ✅ Fallback |
The library automatically downloads pre-built binaries from the official llama.cpp releases with the appropriate GPU support for your platform. The download happens automatically on first use!
params := gollama.Model_default_params()
params.n_ctx = 4096 // Context size
params.use_mmap = true // Memory mapping
params.use_mlock = true // Memory locking
params.vocab_only = false // Load full model
Gollama.cpp automatically downloads pre-built binaries from the official llama.cpp releases. You can also manage libraries manually:
// Load a specific version
err := gollama.LoadLibraryWithVersion("b6099")
// Clean cache to force re-download
err := gollama.CleanLibraryCache()
# Download libraries for current platform
make download-libs
# Download libraries for all platforms
make download-libs-all
# Test download functionality
make test-download
# Test GPU detection and functionality
make test-gpu
# Detect available GPU backends
make detect-gpu
# Clean library cache
make clean-libs
The downloader automatically selects the best variant for your platform:
- macOS: Metal-enabled binaries (arm64/x64)
- Linux: CPU-optimized binaries (CUDA/HIP/Vulkan/SYCL versions available)
- Windows: CPU-optimized binaries (CUDA/HIP/Vulkan/OpenCL/SYCL versions available)
Downloaded libraries are cached in:
- Linux/macOS:
~/.cache/gollama/libs/
- Windows:
%LOCALAPPDATA%/gollama/libs/
- Go 1.21 or later
- Make
# Clone and build
git clone https://github.com/dianlight/gollama.cpp
cd gollama.cpp
# Build for current platform
make build
# Run tests (downloads libraries automatically)
make test
# Build examples
make build-examples
# Run tests
make test
# Generate release packages
make release
The Makefile implements intelligent GPU detection:
- CUDA Detection: Checks for
nvcc
compiler and CUDA toolkit - HIP Detection: Checks for
hipconfig
and ROCm installation - Priority Order: CUDA > HIP > CPU (on Linux/Windows)
- Metal: Always enabled on macOS when Xcode is available
No manual configuration or environment variables required!
This library tracks llama.cpp versions. The version number format is:
vX.Y.Z-llamacpp.ABCD
Where:
X.Y.Z
is the gollama.cpp semantic versionABCD
is the corresponding llama.cpp build number
For example: v0.2.0-llamacpp.b6099
uses llama.cpp build b6099.
See the examples directory for complete examples:
- Simple Chat
- Chat with History
- Embedding Generation
- Model Quantization
- Batch Processing
- GPU Acceleration
Contributions are welcome! Please read our Contributing Guide for details.
If you find this project helpful and would like to support its development, you can:
- ⭐ Star this repository on GitHub
- 🐛 Report bugs and suggest improvements
- 📖 Improve documentation
Your support helps maintain and improve this project for the entire community!
This project is licensed under the MIT License - see the LICENSE file for details.
This license is compatible with llama.cpp's MIT license.
- llama.cpp - The underlying C++ library
- purego - Pure Go C interop library
- ggml - Machine learning tensor library
- Issues - Bug reports and feature requests
- Discussions - Questions and community support