SIVA - A Self-Improving Vulnerability Detection Agent

Author: Valentin Walischewski

Overview

SIVA is an advanced LLM agent, that uses REVOLVE to dynamically optimize its prompts through memory-guided meta-learning, to improve its vulnerability detection capabilities [2]. SIVA combines sophisticated learning techniques ([2], [3], [4]), with real-world vulnerability data to achieve state-of-the-art performance in security analysis.

Key Features

Self-Improvement: Dynamically adapts and improves performance through iterative, memory-guided learning.
Meta-Learning Architecture: SIVA learns how to learn better [5].
Real Vulnerability Data: Evaluated on SecVulEval [6], containing thousands of real vulnerabilities from major open-source projects.
Smart Caching System: Instant analysis of previously seen functions for compute efficiency
Zero Code Execution: Safe static analysis, without running potentially malicious code.

Architecture

Core

Base Agent (SIVA.py)
- REVOLVE learning framework
- Smart memory system with instant cache
- Pattern recognition for CWE types
- Simple failure count based strategy selection
Meta-Learning (MetaSIVA.py)
- Dynamic prompt library that evolves
- Strategy weight optimization
- Failure analysis and adaptation
- Learning-to-learn capabilities

Learning Strategies (implemeted through prompt templates)

Instant Cache: Instant for exact function matches
Focused Learning: Reuse proven solutions
Template Transfer: Apply similar CWE patterns
Multi-Shot Learning: Learn from diverse examples
CWE-specific: Dynamically evolved templates for specific CWE famlies

LLM

We used GEMMA3 (27B) in 4-bit qunatization for all our experiments [7]. The (gemma_server_api.py) script downloads the model from hugginface and runs it on a GPU.

Alternative LLM Options

Use Mock Mode - gives simulated LLM responses for testing
Use LLM API - edit the SecurityLLMClient in (SIVA.py) for compatibility with your LLM of choice

Dataset

This implementation uses the SecVulEval dataset [6], consisting of $25,440$ labeled, filtered, and context-enriched C / C++ functions from real-world projects, including critical infrastructure software such as Linux kernel, OpenSSL, and Apache HTTP Server. The dataset includes vulnerable samples, spanning $5,867$ unique CVE's from $145$ different CWE types.

Installation

Requirements

Python 3.8+
4GB+ RAM recommended (Memory grows linear and creates substantial overhead)
20-24GB VRAM for Gemma3 27B model + Inference (GPU)
Internet connection

Set Up

# Clone the repository
git clone https://github.com/yourusername/siva.git
cd siva

# Install dependencies
pip install -r requirements.txt

Dependencies

# Core SICA-VULN dependencies
pandas>=1.3.0
numpy>=1.21.0
datasets>=2.0.0
transformers>=4.20.0
httpx>=0.24.0

# Gemma3 Server dependencies
fastapi>=0.100.0
uvicorn>=0.23.0
torch>=2.0.0
transformers>=4.35.0
accelerate>=0.24.0
bitsandbytes>=0.41.0
pydantic>=2.0.0

LLM Server Setup (Gemma3 27B)

Quick Start

Get HuggingFace Token

# Sign up at https://huggingface.co
# Get token from https://huggingface.co/settings/tokens

export HF_TOKEN="your_token_here"

Configure GPU (Optional)

# Edit gemma_server_api.py line 50

os.environ['CUDA_VISIBLE_DEVICES'] = '0'  # Change to your GPU ID

Start the Server
```
python gemma_server_api.py
```
The server will:
- Download Gemma3 27B (~54GB first time only)
- Load with 8-bit quantization (~20GB VRAM)
- Start API server on http://localhost:8000

Verify Installation

# Check health
curl http://localhost:8000/health

# Test generation
curl -X POST http://localhost:8000/test

Server Features

Model: Gemma3 27B with 128K context window
Memory: ~14GB (4-bit)
Enhanced: Function calling, multimodal ready

Usage

Quick Start

# Run the main interface
python Sica_Vuln.py

Available Options

Test Single Vulnerability - Quick verification (2 minutes)
Quick Benchmark - 10 samples, 2 iterations (5-10 minutes)
Full Benchmark - 50 samples, 3 iterations (20-30 minutes)
Balanced CWE Benchmark - Test across vulnerability types
Show Dataset Statistics - Explore SecVulEval data
Debug Mode - Verbose logging for development

With Meta-Learning

# Run with meta-learning enhancements
python Meta_Sica.py

Project Structure

sica-vuln/
├── Sica_Vuln.py              # Main SICA-VULN agent
├── Meta_Sica.py              # Meta-learning enhancements
├── gemma_server_api.py       # Gemma3 27B LLM server
├── README.md                 # This file
├── requirements.txt          # Python dependencies
├── sica_vuln_workspace/      # Auto-created workspace
│   ├── cache/                # Dataset cache
│   ├── sica_vuln_memory/     # Learning memory
│   └── meta_prompt_library/  # Evolved prompts
└── hf_cache/                 # Gemma3 model cache (auto-created)

Methodology

1. Data Loading

Downloads SecVulEval dataset from HuggingFace
Processes real vulnerabilities from open-source projects

2. Analysis Pipeline

Input Code → Pattern Recognition → Strategy Selection → 
LLM Analysis → Evaluation → Learning → Memory Update

3. Learning Process

Iteration 1: Baseline analysis
Iteration 2: Apply learned patterns
Iteration 3: Advanced techniques
Meta-Learning: Continuous improvement throughout iterations

4. Memory System

Stores successful analyses
Builds vulnerability pattern database
Enables instant cache for similar code functions

Statement on Generative AI Usage

Given the implementation heavy and ambitious nature of this project, I have made use of modern software tools in its production process. Namely, I have made extensive use of Claude-4 Sonnet [1] to aid me with the following:

Debugging and error fixing
Method prototyping
Improved interpretability (used it to add detailed logging and debugging logging)
Improved robustness (used it to add edge case protections)
Prompt template optimization
Synthetic data and mock response generation
Gemma3 Server implementation (given the well documented nature of this task I used the LLM to help me create the Gemma3 server script)

References

[1] Anthropic (2025), Claude-4 Sonnet

[2] Zhang et al. (2025), REVOLVE: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization

[3] Hu et al. (2025), Automated Design of Agentic Systems

[4] Robeyns et al. (2025), A Self-Improving Coding Agent

[5] T. Liu and M. van der Schaar (2025), Truly Self-Improving Agents Require Intrinsic Metacognitive Learning

[6] Ahmed et al. (2025), SecVulEval: Benchmarking LLMs for Real-World C/C++ Vulnerability Detection

[7] Google DeepMind (2024), Gemma3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SIVA - A Self-Improving Vulnerability Detection Agent

Author: Valentin Walischewski

Overview

Key Features

Architecture

Core

Learning Strategies (implemeted through prompt templates)

LLM

Alternative LLM Options

Dataset

Installation

Requirements

Set Up

Dependencies

LLM Server Setup (Gemma3 27B)

Quick Start

Server Features

Usage

Quick Start

Available Options

With Meta-Learning

Project Structure

Methodology

1. Data Loading

2. Analysis Pipeline

3. Learning Process

4. Memory System

Statement on Generative AI Usage

References

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
LICENSE		LICENSE
MetaSIVA.py		MetaSIVA.py
README.md		README.md
SIVA.py		SIVA.py
gemma_server_api.py		gemma_server_api.py

License

Valliwa/SIVA

Folders and files

Latest commit

History

Repository files navigation

SIVA - A Self-Improving Vulnerability Detection Agent

Author: Valentin Walischewski

Overview

Key Features

Architecture

Core

Learning Strategies (implemeted through prompt templates)

LLM

Alternative LLM Options

Dataset

Installation

Requirements

Set Up

Dependencies

LLM Server Setup (Gemma3 27B)

Quick Start

Server Features

Usage

Quick Start

Available Options

With Meta-Learning

Project Structure

Methodology

1. Data Loading

2. Analysis Pipeline

3. Learning Process

4. Memory System

Statement on Generative AI Usage

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages