Gollama.cpp

A high-performance Go binding for llama.cpp using purego for cross-platform compatibility without CGO.

Features

Pure Go: No CGO required, uses purego for C interop
Cross-Platform: Supports macOS (CPU/Metal), Linux (CPU/NVIDIA/AMD), Windows (CPU/NVIDIA/AMD)
Performance: Direct bindings to llama.cpp shared libraries
Compatibility: Version-synchronized with llama.cpp releases
Easy Integration: Simple Go API for LLM inference
GPU Acceleration: Supports Metal, CUDA, HIP, Vulkan, OpenCL, SYCL, and other backends

Supported Platforms

Gollama.cpp uses a platform-specific architecture with build tags to ensure optimal compatibility and performance across all operating systems.

✅ Fully Supported Platforms

macOS

CPU: Intel x64, Apple Silicon (ARM64)
GPU: Metal (Apple Silicon)
Status: Full feature support with purego
Build Tags: Uses !windows build tag

Linux

CPU: x86_64, ARM64
GPU: NVIDIA (CUDA/Vulkan), AMD (HIP/ROCm/Vulkan), Intel (SYCL/Vulkan)
Status: Full feature support with purego
Build Tags: Uses !windows build tag

🚧 In Development

Windows

CPU: x86_64, ARM64
GPU: NVIDIA (CUDA/Vulkan), AMD (HIP/Vulkan), Intel (SYCL/Vulkan), Qualcomm Adreno (OpenCL) - planned
Status: Build compatibility implemented, runtime support in development
Build Tags: Uses windows build tag with syscall-based library loading
Current State:
- ✅ Compiles without errors on Windows
- ✅ Cross-compilation from other platforms works
- 🚧 Runtime functionality being implemented
- 🚧 GPU acceleration being added

Platform-Specific Implementation Details

Our platform abstraction layer uses Go build tags to provide:

Unix-like systems (!windows): Uses purego for dynamic library loading
Windows (windows): Uses native Windows syscalls (LoadLibraryW, FreeLibrary, GetProcAddress)
Cross-compilation: Supports building for any platform from any platform
Automatic detection: Runtime platform capability detection

Installation

go get github.com/dianlight/gollama.cpp

The Go module automatically downloads pre-built llama.cpp libraries from the official ggml-org/llama.cpp releases on first use. No manual compilation required!

Cross-Platform Development

Build Compatibility Matrix

Our CI system tests compilation across all platforms:

Target Platform	Build From Linux	Build From macOS	Build From Windows
Linux (amd64)	✅	✅	✅
Linux (arm64)	✅	✅	✅
macOS (amd64)	✅	✅	✅
macOS (arm64)	✅	✅	✅
Windows (amd64)	✅	✅	✅
Windows (arm64)	✅	✅	✅

Development Workflow

# Test cross-compilation for all platforms
make test-cross-compile

# Build for specific platform
GOOS=windows GOARCH=amd64 go build ./...
GOOS=linux GOARCH=arm64 go build ./...
GOOS=darwin GOARCH=arm64 go build ./...

# Run platform-specific tests
go test -v -run TestPlatformSpecific ./...

Quick Start

package main

import (
    "fmt"
    "log"

    "github.com/dianlight/gollama.cpp"
)

func main() {
    // Initialize the library
    gollama.Backend_init()
    defer gollama.Backend_free()

    // Load model
    params := gollama.Model_default_params()
    model, err := gollama.Model_load_from_file("path/to/model.gguf", params)
    if err != nil {
        log.Fatal(err)
    }
    defer gollama.Model_free(model)

    // Create context
    ctxParams := gollama.Context_default_params()
    ctx, err := gollama.Init_from_model(model, ctxParams)
    if err != nil {
        log.Fatal(err)
    }
    defer gollama.Free(ctx)

    // Tokenize and generate
    prompt := "The future of AI is"
    tokens, err := gollama.Tokenize(model, prompt, true, false)
    if err != nil {
        log.Fatal(err)
    }

    // Create batch and decode
    batch := gollama.Batch_init(len(tokens), 0, 1)
    defer gollama.Batch_free(batch)

    for i, token := range tokens {
        gollama.Batch_add(batch, token, int32(i), []int32{0}, false)
    }

    if err := gollama.Decode(ctx, batch); err != nil {
        log.Fatal(err)
    }

    // Sample next token
    logits := gollama.Get_logits_ith(ctx, -1)
    candidates := gollama.Token_data_array_init(model)
    
    sampler := gollama.Sampler_init_greedy()
    defer gollama.Sampler_free(sampler)
    
    newToken := gollama.Sampler_sample(sampler, ctx, candidates)
    
    // Convert token to text
    text := gollama.Token_to_piece(model, newToken, false)
    fmt.Printf("Generated: %s\n", text)
}

Advanced Usage

GPU Configuration

Gollama.cpp automatically downloads the appropriate pre-built binaries with GPU support and configures the optimal backend:

// Automatic GPU detection and configuration
params := gollama.Context_default_params()
params.n_gpu_layers = 32 // Offload layers to GPU (if available)

// Detect available GPU backend
backend := gollama.DetectGpuBackend()
fmt.Printf("Using GPU backend: %s\n", backend.String())

// Platform-specific optimizations:
// - macOS: Uses Metal when available  
// - Linux: Supports CUDA, HIP, Vulkan, and SYCL
// - Windows: Supports CUDA, HIP, Vulkan, OpenCL, and SYCL
params.split_mode = gollama.LLAMA_SPLIT_MODE_LAYER

GPU Support Matrix

Platform	GPU Type	Backend	Status
macOS	Apple Silicon	Metal	✅ Supported
macOS	Intel/AMD	CPU only	✅ Supported
Linux	NVIDIA	CUDA	✅ Available in releases
Linux	NVIDIA	Vulkan	✅ Available in releases
Linux	AMD	HIP/ROCm	✅ Available in releases
Linux	AMD	Vulkan	✅ Available in releases
Linux	Intel	SYCL	✅ Available in releases
Linux	Intel/Other	Vulkan	✅ Available in releases
Linux	Intel/Other	CPU	✅ Fallback
Windows	NVIDIA	CUDA	✅ Available in releases
Windows	NVIDIA	Vulkan	✅ Available in releases
Windows	AMD	HIP	✅ Available in releases
Windows	AMD	Vulkan	✅ Available in releases
Windows	Intel	SYCL	✅ Available in releases
Windows	Qualcomm Adreno	OpenCL	✅ Available in releases
Windows	Intel/Other	Vulkan	✅ Available in releases
Windows	Intel/Other	CPU	✅ Fallback

The library automatically downloads pre-built binaries from the official llama.cpp releases with the appropriate GPU support for your platform. The download happens automatically on first use!

Model Loading Options

params := gollama.Model_default_params()
params.n_ctx = 4096           // Context size
params.use_mmap = true        // Memory mapping
params.use_mlock = true       // Memory locking
params.vocab_only = false     // Load full model

Library Management

Gollama.cpp automatically downloads pre-built binaries from the official llama.cpp releases. You can also manage libraries manually:

// Load a specific version
err := gollama.LoadLibraryWithVersion("b6099")

// Clean cache to force re-download
err := gollama.CleanLibraryCache()

Command Line Tools

# Download libraries for current platform
make download-libs

# Download libraries for all platforms  
make download-libs-all

# Test download functionality
make test-download

# Test GPU detection and functionality
make test-gpu

# Detect available GPU backends
make detect-gpu

# Clean library cache
make clean-libs

Available Library Variants

The downloader automatically selects the best variant for your platform:

macOS: Metal-enabled binaries (arm64/x64)
Linux: CPU-optimized binaries (CUDA/HIP/Vulkan/SYCL versions available)
Windows: CPU-optimized binaries (CUDA/HIP/Vulkan/OpenCL/SYCL versions available)

Cache Location

Downloaded libraries are cached in:

Linux/macOS: ~/.cache/gollama/libs/
Windows: %LOCALAPPDATA%/gollama/libs/

Building from Source

Prerequisites

Go 1.21 or later
Make

Quick Start

# Clone and build
git clone https://github.com/dianlight/gollama.cpp
cd gollama.cpp

# Build for current platform
make build

# Run tests (downloads libraries automatically)
make test

# Build examples
make build-examples

# Run tests
make test

# Generate release packages
make release

GPU Detection Logic

The Makefile implements intelligent GPU detection:

CUDA Detection: Checks for nvcc compiler and CUDA toolkit
HIP Detection: Checks for hipconfig and ROCm installation
Priority Order: CUDA > HIP > CPU (on Linux/Windows)
Metal: Always enabled on macOS when Xcode is available

No manual configuration or environment variables required!

Version Compatibility

This library tracks llama.cpp versions. The version number format is:

vX.Y.Z-llamacpp.ABCD

Where:

X.Y.Z is the gollama.cpp semantic version
ABCD is the corresponding llama.cpp build number

For example: v0.2.0-llamacpp.b6099 uses llama.cpp build b6099.

Documentation

Examples

See the examples directory for complete examples:

Contributing

Contributions are welcome! Please read our Contributing Guide for details.

Funding

If you find this project helpful and would like to support its development, you can:

⭐ Star this repository on GitHub
🐛 Report bugs and suggest improvements
📖 Improve documentation

Your support helps maintain and improve this project for the entire community!

License

This project is licensed under the MIT License - see the LICENSE file for details.

This license is compatible with llama.cpp's MIT license.

Acknowledgments

llama.cpp - The underlying C++ library
purego - Pure Go C interop library
ggml - Machine learning tensor library

Support

Issues - Bug reports and feature requests
Discussions - Questions and community support

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.copilot		.copilot
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
cmd/gollama-download		cmd/gollama-download
docs		docs
examples		examples
libs		libs
models		models
scripts		scripts
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
.gosec.json		.gosec.json
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.go		config.go
downloader.go		downloader.go
errors.go		errors.go
go.mod		go.mod
go.sum		go.sum
gollama.go		gollama.go
gollama_test.go		gollama_test.go
loader.go		loader.go
loader_test.go		loader_test.go
platform_test.go		platform_test.go
platform_unix.go		platform_unix.go
platform_windows.go		platform_windows.go

Uh oh!

License

dianlight/gollama.cpp

Folders and files

Latest commit

History

Repository files navigation

Gollama.cpp

Features

Supported Platforms

✅ Fully Supported Platforms

macOS

Linux

🚧 In Development

Windows

Platform-Specific Implementation Details

Installation

Cross-Platform Development

Build Compatibility Matrix

Development Workflow

Quick Start

Advanced Usage

GPU Configuration

GPU Support Matrix

Model Loading Options

Library Management

Command Line Tools

Available Library Variants

Cache Location

Building from Source

Prerequisites

Quick Start

GPU Detection Logic

Version Compatibility

Documentation

Examples

Contributing

Funding

License

Acknowledgments

Support

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages