Release Release v0.2.0-llamacpp.b6099 · dianlight/gollama.cpp

[v0.2.0-llamacpp.b6099] - 2025-08-06

🚀 Major GPU Backend Enhancements

This release significantly expands GPU support with three new GPU backends and comprehensive detection systems:

🔥 Vulkan Support: Cross-platform GPU acceleration for NVIDIA, AMD, and Intel GPUs
⚡ OpenCL Support: Broad compatibility including Qualcomm Adreno GPUs on ARM64 devices
🧠 SYCL Support: Intel oneAPI unified parallel programming for CPUs and GPUs
🔍 Smart Detection: Automatic GPU backend detection with optimal library selection
📦 Intelligent Downloads: GPU-aware library downloader selects best variants automatically

Added

Enhanced GPU Backend Support with comprehensive detection and automatic selection
- Vulkan support for cross-platform GPU acceleration (NVIDIA, AMD, Intel GPUs)
- OpenCL support for diverse GPU vendors including Qualcomm Adreno on ARM64
- SYCL support for Intel oneAPI toolkit and unified parallel programming
- GPU backend detection with DetectGpuBackend() function and LlamaGpuBackend enum
- Automatic GPU variant selection in library downloader based on available SDKs
- GPU detection tools with new detect-gpu Makefile target
- GPU testing suite with test-gpu target and comprehensive backend tests
Enhanced Documentation for GPU setup and configuration
- Comprehensive GPU installation guides for Linux (Vulkan, OpenCL, SYCL)
- Windows GPU setup instructions for all supported backends
- Platform-specific GPU requirements and verification steps
- Updated GPU Support Matrix with all available backends
Improved CI/CD Pipeline with GPU detection testing
- Added GPU detection tools installation in CI workflows
- New gpu-detection job for comprehensive GPU backend testing
- GPU-aware library downloading validation
Download-based architecture using pre-built binaries from official llama.cpp releases
Automatic library download system with platform detection
Library cache management with clean-libs target
Cross-platform download testing (test-download-platforms)
Command-line download tool (cmd/gollama-download)
clone-llamacpp target for developers needing source code cross-reference
Platform-specific architecture with Go build tags for improved cross-platform support
Windows compilation compatibility using native syscalls (LoadLibraryW, FreeLibrary)
Cross-platform compilation testing in CI pipeline
Platform capability detection functions (isPlatformSupported, getPlatformError)
Integrated hf.sh script management for Hugging Face model downloads
update-hf-script target for updating hf.sh from llama.cpp repository
Enhanced model download system using hf.sh instead of direct curl
Comprehensive tools documentation (docs/TOOLS.md)
Dedicated platform-specific test suite (TestPlatformSpecific)
Enhanced Makefile with cross-compilation targets (test-cross-compile, test-compile-*)
Comprehensive platform migration documentation
Initial Go binding for llama.cpp using purego
Cross-platform support (macOS, Linux, Windows)
CPU and GPU acceleration support

Changed

Enhanced GPU Backend Priority: Updated detection order to CUDA → HIP → Vulkan → OpenCL → SYCL → CPU
Intelligent Library Selection: Downloader now automatically selects optimal GPU variant based on available tools
Expanded Platform Support: Added comprehensive GPU backend support across Linux and Windows
Updated GPU Configuration: Enhanced context parameters with automatic GPU backend detection
Dependencies: Updated llama.cpp from build b6076 to b6099 (automated via Renovate)
Breaking: Migrated from compilation-based to download-based architecture
Simplified build process: No longer requires CMake, compilers, or GPU SDKs
Library loading now uses automatic download instead of local compilation
Updated documentation to reflect new download-based workflow
Model download system: Now uses hf.sh script from llama.cpp instead of direct curl commands
Example projects: Updated to use local hf.sh script from scripts/ directory
Documentation: Updated to reflect hf.sh script integration and usage

GPU Backend Support Matrix

Backend	Platforms	GPU Vendors	Detection Command	Status
Metal	macOS	Apple Silicon	`system_profiler`	✅ Production
CUDA	Linux, Windows	NVIDIA	`nvcc`	✅ Production
HIP/ROCm	Linux, Windows	AMD	`hipconfig`	✅ Production
Vulkan	Linux, Windows	NVIDIA, AMD, Intel	`vulkaninfo`	✅ Production
OpenCL	Windows, Linux	Qualcomm Adreno, Intel, AMD	`clinfo`	✅ Production
SYCL	Linux, Windows	Intel, NVIDIA	`sycl-ls`	✅ Production
CPU	All	All	N/A	✅ Fallback

Dependencies

llama.cpp: Updated from b6076 to b6099 (managed by Renovate)

Removed

All build-llamacpp-* compilation targets (no longer needed)
CMake and compiler dependencies for regular builds
Complex GPU SDK detection at build time
build-libs-gpu and build-libs-cpu targets
Complete API coverage for llama.cpp functions
Pre-built llama.cpp libraries for all platforms
Comprehensive examples and documentation
GitHub Actions CI/CD pipeline
Automated release system

Changed

Breaking internal change: Migrated from direct purego imports to platform-specific abstraction layer
Separated platform-specific code into platform_unix.go and platform_windows.go with appropriate build tags
Updated CI to test cross-compilation for all platforms (Windows, Linux, macOS on both amd64 and arm64)
Enhanced documentation to reflect platform-specific implementation details

Fixed

Windows CI compilation errors: Fixed undefined purego.Dlopen, purego.RTLD_NOW, purego.RTLD_GLOBAL, and purego.Dlclose symbols
Cross-compilation now works from any platform to any platform
Platform detection properly handles unsupported/incomplete platforms

Features

Pure Go implementation (no CGO required)
Enhanced GPU Support:
- Metal support for macOS (Apple Silicon and Intel)
- CUDA support for NVIDIA GPUs
- HIP support for AMD GPUs
- NEW: Vulkan support for cross-platform GPU acceleration
- NEW: OpenCL support for diverse GPU vendors (including Qualcomm Adreno)
- NEW: SYCL support for Intel oneAPI and unified parallel programming
- NEW: Automatic GPU backend detection and selection
- NEW: GPU-aware library downloading with optimal variant selection
Memory mapping and locking options
Batch processing capabilities
Multiple sampling strategies
Model quantization support
Context state management
Token manipulation utilities

Platform Support

macOS: ✅ Intel x64, Apple Silicon (ARM64) with Metal - Fully supported
Linux: ✅ x86_64, ARM64 with CUDA/HIP/Vulkan/SYCL - Fully supported
Windows: 🚧 x86_64, ARM64 with CUDA/HIP/Vulkan/OpenCL/SYCL - Build compatibility implemented, runtime support in development

Technical Details

Unix-like platforms (Linux, macOS): Use purego for dynamic library loading
Windows platform: Use native Windows syscalls for library management
Build tags: !windows for Unix-like, windows for Windows-specific code
Zero runtime overhead: Platform abstraction has no performance impact
GPU Detection Priority: CUDA → HIP → Vulkan → OpenCL → SYCL → CPU (Linux/Windows)
Automatic Fallback: Graceful degradation to CPU when GPU backends unavailable
Command Detection: Uses exec.LookPath() for cross-platform command availability
Pattern Matching: Regex-based asset selection for optimal GPU variant downloads

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Release v0.2.0-llamacpp.b6099

[v0.2.0-llamacpp.b6099] - 2025-08-06

🚀 Major GPU Backend Enhancements

Added

Changed

GPU Backend Support Matrix

Dependencies

Removed

Changed

Fixed

Features

Platform Support

Technical Details

Uh oh!