This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Hoard is a high-performance, scalable memory allocator for multithreaded applications. It's a drop-in replacement for malloc that eliminates contention, false sharing, and memory blowup problems common in system allocators.
mkdir build && cd build
cmake ..
makeOutput: build/libhoard.dylib (macOS) or build/libhoard.so (Linux)
Windows builds use Microsoft Detours for function interposition. Detours is automatically fetched and built by CMake:
mkdir build && cd build
cmake ..
cmake --build . --config ReleaseOutput: build/Release/hoard.dll, build/Release/withdll.exe, build/Release/setdll.exe
Using a pre-installed Detours (optional):
If you prefer to use a system-installed Detours (via vcpkg or manual build):
# Install via vcpkg
vcpkg install detours:x64-windows # or arm64-windows, x86-windows
# Build with system Detours
cmake .. -DUSE_SYSTEM_DETOURS=ON -DCMAKE_TOOLCHAIN_FILE=C:/vcpkg/scripts/buildsystems/vcpkg.cmake
# Or if built from source
cmake .. -DUSE_SYSTEM_DETOURS=ON -DDETOURS_ROOT=C:/path/to/Detourscd benchmarks
makeIndividual benchmark:
cd benchmarks/threadtest
makeLinux:
LD_PRELOAD=/path/to/libhoard.so ./myprogrammacOS:
DYLD_INSERT_LIBRARIES=/path/to/libhoard.dylib ./myprogramWindows (unmodified binaries):
Important: Programs must be compiled with /MD (dynamic C runtime) for Hoard to intercept allocations. Programs compiled with /MT (static C runtime) have allocation functions embedded directly in the executable, which Hoard cannot intercept.
Windows uses DLL injection via withdll.exe (built automatically with Hoard):
# From the build directory:
build\Release\withdll.exe /d:build\Release\hoard.dll myprogram.exe [args...]The /d: flag specifies the DLL to inject. Multiple DLLs can be injected:
withdll.exe /d:hoard.dll /d:other.dll myprogram.exeAlternative Windows methods:
-
setdll.exe (permanent modification): Modifies the executable's import table to always load Hoard (also built automatically):
# Add Hoard to executable (creates backup as .exe~) build\Release\setdll.exe /d:build\Release\hoard.dll myprogram.exe # Remove Hoard from executable build\Release\setdll.exe /r:hoard.dll myprogram.exe
-
AppInit_DLLs (system-wide, requires admin): Registry-based injection for all processes:
# Not recommended for production - affects all processes reg add "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Windows" /v AppInit_DLLs /t REG_SZ /d "C:\path\to\hoard.dll" reg add "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Windows" /v LoadAppInit_DLLs /t REG_DWORD /d 1
./threadtest <threads> <iterations> <objects> <work> <size>
# Example: ./threadtest 4 1000 10000 0 8Windows benchmark example:
withdll.exe /d:hoard.dll threadtest.exe 4 1000 10000 0 8Thread-Local Allocation Buffers (TLABs)
↓ overflow
Per-Thread Heaps (PerThreadHoardHeap)
↓ emptiness threshold crossed
Global Heap (TheGlobalHeap)
↓ OS allocation
MmapSource (AlignedMmap)
Superblocks: Memory is managed in aligned chunks (256KB on Unix, 64KB on Windows). Each superblock contains a header and object allocations. Superblock address found via bitmask: ptr & ~(SUPERBLOCK_SIZE-1).
Emptiness Classes: Superblocks are categorized by fullness (8 classes). This enables efficient memory reclamation - when a per-thread heap crosses the emptiness threshold, superblocks move to the global heap.
TLABs: Per-thread caches for small objects (up to 1024 bytes). Max 16MB per TLAB. Reduces contention between threads.
Size Separation: Small objects go through SmallHeap (thread-local with superblock management). Large objects go through BigHeap (threshold-based segment heap with geometric size classes).
src/
├── include/
│ ├── hoard/ # Core allocator components
│ │ ├── hoardheap.h # Main heap composition (HoardHeap template)
│ │ ├── hoardmanager.h # Superblock manager by emptiness classes
│ │ ├── globalheap.h # Single global heap for redistribution
│ │ ├── hoardsuperblock.h # Superblock structure
│ │ └── hoardconstants.h # Configuration constants
│ ├── superblocks/ # Superblock/TLAB management
│ │ ├── tlab.h # Thread-local allocation buffer
│ │ └── alignedsuperblockheap.h
│ └── util/ # Generic utilities
│ ├── alignedmmap.h # Aligned OS allocation
│ └── thresholdsegheap.h # Threshold-based segment heap
├── source/
│ ├── libhoard.cpp # malloc/free/realloc entry points
│ ├── mactls.cpp # macOS thread-local storage
│ ├── unixtls.cpp # Unix TLS & pthread interception
│ ├── wintls.cpp # Windows TLS & DllMain
│ └── winwrapper-detours.cpp # Windows Detours-based interposition
└── cmake/
└── FindDetours.cmake # CMake module to find Detours library
Heap-Layers Dependency: Fetched via CMake FetchContent from https://github.com/emeryberger/Heap-Layers. Provides the layered heap framework, locks, and utility wrappers.
MAX_MEMORY_PER_TLAB: 16MBMaxThreads: 2048NumHeaps: 128LargestSmallObject: 1024 bytes
- macOS: Uses
MacLockType,macwrapper.cpp,mactls.cpp - Linux: Uses
SpinLockType,unixtls.cpp - Windows: Uses
WinLockType,winwrapper-detours.cpp,wintls.cpp- Supports x86, x64, ARM, and ARM64 architectures
- Uses Microsoft Detours for function interposition
- Intercepts CRT, Windows Heap API, and RTL Heap API functions
The allocator is built through template composition. The main heap type HoardHeap<N, NH> composes:
ANSIWrapper- Standard malloc interfaceIgnoreInvalidFree- Graceful handling of bad freesHybridHeap- Routes by size to SmallHeap or BigHeapThreadPoolHeap- Per-thread heap poolRedirectFree- Routes frees to correct heap via superblock header
Located in benchmarks/:
threadtest- Per-thread throughput (allocation/deallocation cycles)cache-scratch,cache-thrash- False sharing testslarson- Server workload simulationlinux-scalability- University of Michigan scalability test