Skip to content

Release v0.2.0-llamacpp.b6099

Latest
Compare
Choose a tag to compare

[v0.2.0-llamacpp.b6099] - 2025-08-06

🚀 Major GPU Backend Enhancements

This release significantly expands GPU support with three new GPU backends and comprehensive detection systems:

  • 🔥 Vulkan Support: Cross-platform GPU acceleration for NVIDIA, AMD, and Intel GPUs
  • ⚡ OpenCL Support: Broad compatibility including Qualcomm Adreno GPUs on ARM64 devices
  • 🧠 SYCL Support: Intel oneAPI unified parallel programming for CPUs and GPUs
  • 🔍 Smart Detection: Automatic GPU backend detection with optimal library selection
  • 📦 Intelligent Downloads: GPU-aware library downloader selects best variants automatically

Added

  • Enhanced GPU Backend Support with comprehensive detection and automatic selection
    • Vulkan support for cross-platform GPU acceleration (NVIDIA, AMD, Intel GPUs)
    • OpenCL support for diverse GPU vendors including Qualcomm Adreno on ARM64
    • SYCL support for Intel oneAPI toolkit and unified parallel programming
    • GPU backend detection with DetectGpuBackend() function and LlamaGpuBackend enum
    • Automatic GPU variant selection in library downloader based on available SDKs
    • GPU detection tools with new detect-gpu Makefile target
    • GPU testing suite with test-gpu target and comprehensive backend tests
  • Enhanced Documentation for GPU setup and configuration
    • Comprehensive GPU installation guides for Linux (Vulkan, OpenCL, SYCL)
    • Windows GPU setup instructions for all supported backends
    • Platform-specific GPU requirements and verification steps
    • Updated GPU Support Matrix with all available backends
  • Improved CI/CD Pipeline with GPU detection testing
    • Added GPU detection tools installation in CI workflows
    • New gpu-detection job for comprehensive GPU backend testing
    • GPU-aware library downloading validation
  • Download-based architecture using pre-built binaries from official llama.cpp releases
  • Automatic library download system with platform detection
  • Library cache management with clean-libs target
  • Cross-platform download testing (test-download-platforms)
  • Command-line download tool (cmd/gollama-download)
  • clone-llamacpp target for developers needing source code cross-reference
  • Platform-specific architecture with Go build tags for improved cross-platform support
  • Windows compilation compatibility using native syscalls (LoadLibraryW, FreeLibrary)
  • Cross-platform compilation testing in CI pipeline
  • Platform capability detection functions (isPlatformSupported, getPlatformError)
  • Integrated hf.sh script management for Hugging Face model downloads
  • update-hf-script target for updating hf.sh from llama.cpp repository
  • Enhanced model download system using hf.sh instead of direct curl
  • Comprehensive tools documentation (docs/TOOLS.md)
  • Dedicated platform-specific test suite (TestPlatformSpecific)
  • Enhanced Makefile with cross-compilation targets (test-cross-compile, test-compile-*)
  • Comprehensive platform migration documentation
  • Initial Go binding for llama.cpp using purego
  • Cross-platform support (macOS, Linux, Windows)
  • CPU and GPU acceleration support

Changed

  • Enhanced GPU Backend Priority: Updated detection order to CUDA → HIP → Vulkan → OpenCL → SYCL → CPU
  • Intelligent Library Selection: Downloader now automatically selects optimal GPU variant based on available tools
  • Expanded Platform Support: Added comprehensive GPU backend support across Linux and Windows
  • Updated GPU Configuration: Enhanced context parameters with automatic GPU backend detection
  • Dependencies: Updated llama.cpp from build b6076 to b6099 (automated via Renovate)
  • Breaking: Migrated from compilation-based to download-based architecture
  • Simplified build process: No longer requires CMake, compilers, or GPU SDKs
  • Library loading now uses automatic download instead of local compilation
  • Updated documentation to reflect new download-based workflow
  • Model download system: Now uses hf.sh script from llama.cpp instead of direct curl commands
  • Example projects: Updated to use local hf.sh script from scripts/ directory
  • Documentation: Updated to reflect hf.sh script integration and usage

GPU Backend Support Matrix

Backend Platforms GPU Vendors Detection Command Status
Metal macOS Apple Silicon system_profiler ✅ Production
CUDA Linux, Windows NVIDIA nvcc ✅ Production
HIP/ROCm Linux, Windows AMD hipconfig ✅ Production
Vulkan Linux, Windows NVIDIA, AMD, Intel vulkaninfo ✅ Production
OpenCL Windows, Linux Qualcomm Adreno, Intel, AMD clinfo ✅ Production
SYCL Linux, Windows Intel, NVIDIA sycl-ls ✅ Production
CPU All All N/A ✅ Fallback

Dependencies

  • llama.cpp: Updated from b6076 to b6099 (managed by Renovate)

Removed

  • All build-llamacpp-* compilation targets (no longer needed)
  • CMake and compiler dependencies for regular builds
  • Complex GPU SDK detection at build time
  • build-libs-gpu and build-libs-cpu targets
  • Complete API coverage for llama.cpp functions
  • Pre-built llama.cpp libraries for all platforms
  • Comprehensive examples and documentation
  • GitHub Actions CI/CD pipeline
  • Automated release system

Changed

  • Breaking internal change: Migrated from direct purego imports to platform-specific abstraction layer
  • Separated platform-specific code into platform_unix.go and platform_windows.go with appropriate build tags
  • Updated CI to test cross-compilation for all platforms (Windows, Linux, macOS on both amd64 and arm64)
  • Enhanced documentation to reflect platform-specific implementation details

Fixed

  • Windows CI compilation errors: Fixed undefined purego.Dlopen, purego.RTLD_NOW, purego.RTLD_GLOBAL, and purego.Dlclose symbols
  • Cross-compilation now works from any platform to any platform
  • Platform detection properly handles unsupported/incomplete platforms

Features

  • Pure Go implementation (no CGO required)
  • Enhanced GPU Support:
    • Metal support for macOS (Apple Silicon and Intel)
    • CUDA support for NVIDIA GPUs
    • HIP support for AMD GPUs
    • NEW: Vulkan support for cross-platform GPU acceleration
    • NEW: OpenCL support for diverse GPU vendors (including Qualcomm Adreno)
    • NEW: SYCL support for Intel oneAPI and unified parallel programming
    • NEW: Automatic GPU backend detection and selection
    • NEW: GPU-aware library downloading with optimal variant selection
  • Memory mapping and locking options
  • Batch processing capabilities
  • Multiple sampling strategies
  • Model quantization support
  • Context state management
  • Token manipulation utilities

Platform Support

  • macOS: ✅ Intel x64, Apple Silicon (ARM64) with Metal - Fully supported
  • Linux: ✅ x86_64, ARM64 with CUDA/HIP/Vulkan/SYCL - Fully supported
  • Windows: 🚧 x86_64, ARM64 with CUDA/HIP/Vulkan/OpenCL/SYCL - Build compatibility implemented, runtime support in development

Technical Details

  • Unix-like platforms (Linux, macOS): Use purego for dynamic library loading
  • Windows platform: Use native Windows syscalls for library management
  • Build tags: !windows for Unix-like, windows for Windows-specific code
  • Zero runtime overhead: Platform abstraction has no performance impact
  • GPU Detection Priority: CUDA → HIP → Vulkan → OpenCL → SYCL → CPU (Linux/Windows)
  • Automatic Fallback: Graceful degradation to CPU when GPU backends unavailable
  • Command Detection: Uses exec.LookPath() for cross-platform command availability
  • Pattern Matching: Regex-based asset selection for optimal GPU variant downloads