Memory Spilling

This project demonstrates custom memory management using RAPIDS Memory Manager (RMM) with cuDF integration, featuring high-performance host memory spilling capabilities for GPU data processing workflows.

Project Structure

memory/
├── src/                          # Core source files
│   ├── fixed_size_host_memory_resource.cpp
│   └── cudf_table_converter.cpp
├── include/                      # Header files
│   ├── fixed_size_host_memory_resource.hpp
│   └── cudf_table_converter.hpp
├── tests/                        # Test files (Catch2 framework)
│   ├── main.cpp                  # Basic RMM test
│   ├── simple_test.cpp           # Simple cuDF to host test
│   ├── cudf_to_host_test.cpp     # cuDF to host functionality tests
│   ├── host_to_cudf_test.cpp     # Host to cuDF functionality tests
│   ├── custom_memory_resource_test.cpp
│   └── multiple_blocks_test.cpp
├── benchmarks/                   # Performance benchmarks
│   ├── benchmark_converter.cpp   # Concurrent benchmark (2 streams)
│   ├── performance_comparison.cpp # Single vs concurrent comparison
│   ├── benchmark_output.hpp      # Benchmark output utilities (header)
│   └── benchmark_output.cpp      # Benchmark output utilities (implementation)
├── build/                        # Build directory
├── CMakeLists.txt               # Build configuration
└── README.md                    # This file

Features

Spilling Memory Resource

Fixed-size host memory resource with dynamic pool expansion
Thread-safe allocation and deallocation
RAII wrapper for multi-block allocations
Configurable block size (default: 4MB) and pool size (default: 256 blocks)
Namespace: spilling for clear identification of memory spilling functionality

cuDF Integration

Table conversion from GPU to host memory
Metadata preservation for table recreation
Stream-based asynchronous operations
Multi-block allocation for large datasets

Performance Benchmarks

Concurrent processing with 2 streams and separate data
Performance comparison between single-threaded and concurrent approaches
Large dataset testing (500MB+ per stream)

Prerequisites

Conda Dependencies

Install the required RAPIDS and CUDA dependencies using conda:

conda install -c rapidsai -c conda-forge -c nvidia rapidsai::libcudf=25.04 cmake

CUDA Compiler

Ensure your CUDA compiler is available in one of these ways:

Default location: CUDA compiler should be at /usr/local/cuda/bin/nvcc
Environment variable: Set CMAKE_CUDA_COMPILER to point to your nvcc installation:
```
export CMAKE_CUDA_COMPILER=/path/to/your/cuda/bin/nvcc
```

Building

# Create build directory
mkdir build && cd build

# Configure and build
cmake ..
make -j$(nproc)

Running Tests

All Tests

# Run all tests
./tests

# Run specific test categories
./tests "[cudf]"                    # All cuDF tests
./tests "[to_host]"                 # cuDF to host tests
./tests "[host_to_cudf]"            # Host to cuDF tests
./tests "[memory]"                  # Memory resource tests
./tests "~[rmm] ~[memory]"          # Exclude problematic tests

Test Structure

cuDF to Host Tests: Convert cuDF tables to host memory
Host to cuDF Tests: Convert host memory back to cuDF tables
Memory Resource Tests: Test custom memory allocation
RMM Tests: Basic RMM functionality

Benchmarks

# Concurrent benchmark (2 streams, 4MB blocks)
./benchmark_converter

# Performance comparison (single vs concurrent)
./performance_comparison

Performance Results

4MB Block Size (256 blocks = 1GB total)

Convert to host: ~8,000-8,800 MB/s
Recreate from host: ~9,200-9,900 MB/s

Concurrent Performance (2 streams)

Aggregate throughput: ~8,000-9,000 MB/s

Dependencies

The following dependencies are required and can be installed via conda:

RMM: RAPIDS Memory Manager (included in libcudf)
cuDF: CUDA DataFrames (libcudf=25.04)
CUDA: NVIDIA CUDA Toolkit (included in libcudf)
CMake: Build system
C++20: Modern C++ standard for enhanced features

Installation

conda install -c rapidsai -c conda-forge -c nvidia rapidsai::libcudf=25.04 cmake

Configuration

The project uses 4MB blocks by default with 256 blocks total (1GB capacity). This can be modified in the benchmark files:

constexpr std::size_t block_size = 4 * 1024 * 1024;  // 4MB blocks
constexpr std::size_t pool_size = 256;               // 256 blocks (1GB total)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Memory Spilling

Project Structure

Features

Spilling Memory Resource

cuDF Integration

Performance Benchmarks

Prerequisites

Conda Dependencies

CUDA Compiler

Building

Running Tests

All Tests

Test Structure

Benchmarks

Performance Results

4MB Block Size (256 blocks = 1GB total)

Concurrent Performance (2 streams)

Dependencies

Installation

Configuration

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
benchmarks		benchmarks
include		include
src		src
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
PROFILING_README.md		PROFILING_README.md
README.md		README.md
profile.sh		profile.sh

License

felipeblazing/memory_spilling

Folders and files

Latest commit

History

Repository files navigation

Memory Spilling

Project Structure

Features

Spilling Memory Resource

cuDF Integration

Performance Benchmarks

Prerequisites

Conda Dependencies

CUDA Compiler

Building

Running Tests

All Tests

Test Structure

Benchmarks

Performance Results

4MB Block Size (256 blocks = 1GB total)

Concurrent Performance (2 streams)

Dependencies

Installation

Configuration

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages