Ollama Environment Configuration

Overview

When using Ollama as your LLM provider (instead of OpenAI), you need to configure system-wide environment variables before starting the Ollama service. These settings optimize performance, enable parallel processing, and help manage resource constraints.

Environment Variables

Configure these environment variables on your system (not in the Flexible GraphRAG .env file):

Context Length

OLLAMA_CONTEXT_LENGTH=8192

Configuration Options:

4096: Minimum for limited resources
8192: Recommended default
16384: For improved speed and extraction quality

Important Notes:

The full 128k possible context window for llama3.2:3b requires 16.4GB of RAM for the key-value (KV) cache alone, plus ~3GB for model weights
The 128K token context window allows processing ~96,240 words of text in a single interaction
By default, inference engines (llama.cpp, transformers, Ollama) store both model weights and KV cache in GPU VRAM when available (fastest)
If GPU VRAM is insufficient, the KV cache falls back to system RAM with potential speed penalty

Debug Logging

OLLAMA_DEBUG=1

Values:

1: Enable debug logging
0: Disable debug logging

Log Locations:

Windows: C:\Users\<username>\AppData\Local\Ollama\server.log
Linux/macOS: Check Ollama documentation for your platform

Use Cases:

Checking GPU memory availability
Identifying CPU fallback behavior
Troubleshooting performance issues

Model Persistence

OLLAMA_KEEP_ALIVE=30m

Keeps models loaded in memory for faster subsequent requests. Adjust time based on your usage patterns and available memory.

Maximum Loaded Models

OLLAMA_MAX_LOADED_MODELS=4

Values:

0: No limit (loads as many as needed)
4: Recommended for most systems
Adjust based on your available memory

Model Storage Directory

# Windows example
OLLAMA_MODELS=C:\Users\<username>\.ollama\models

# Linux/macOS example
OLLAMA_MODELS=/home/<username>/.ollama/models

Usually set automatically by Ollama, but can be customized for specific storage locations.

Parallel Request Handling

OLLAMA_NUM_PARALLEL=4

⚠️ CRITICAL SETTING:

Required for Flexible GraphRAG parallel file processing
Prevents processing errors during parallel document ingestion
Allows Ollama to handle multiple concurrent requests
Must match or exceed the number of worker threads used by the system

Installation Steps

Windows

Open System Properties → Advanced → Environment Variables
Under System variables (not User variables), click New
Add each variable name and value
Click OK to save
Restart the Ollama service:
```
net stop Ollama
net start Ollama
```

Linux/macOS

Add to your shell profile (~/.bashrc, ~/.zshrc, etc.):

export OLLAMA_CONTEXT_LENGTH=8192
export OLLAMA_DEBUG=1
export OLLAMA_KEEP_ALIVE=30m
export OLLAMA_MAX_LOADED_MODELS=4
export OLLAMA_NUM_PARALLEL=4

Reload your shell configuration:
```
source ~/.bashrc  # or ~/.zshrc
```

Restart Ollama service:

systemctl restart ollama  # On Linux with systemd
# or
brew services restart ollama  # On macOS with Homebrew

Verification

After configuration, verify the settings are active:

Check Ollama is running:
```
ollama list
```
Test with a simple request:
```
ollama run llama3.2:3b "Hello"
```
Check debug logs (if OLLAMA_DEBUG=1):
- Windows: C:\Users\<username>\AppData\Local\Ollama\server.log
- Look for configuration values and GPU/CPU usage information

Troubleshooting

Issue: Processing Errors with Multiple Files

Symptom: Errors when processing multiple documents simultaneously

Solution: Ensure OLLAMA_NUM_PARALLEL=4 is set system-wide and Ollama service has been restarted

Issue: Slow Performance

Symptoms:

Document processing takes much longer than expected
High CPU usage but low GPU usage

Possible Causes:

GPU VRAM exhausted: Context window too large for available VRAM
CPU fallback: Model running on CPU instead of GPU

Solutions:

Reduce OLLAMA_CONTEXT_LENGTH to 4096
Check debug logs for GPU memory issues
Close other GPU-intensive applications
Consider using a smaller model (e.g., llama3.2:3b instead of gpt-oss:20b)

Issue: "Out of Memory" Errors

Solution:

Reduce OLLAMA_CONTEXT_LENGTH
Reduce OLLAMA_MAX_LOADED_MODELS
Ensure adequate system RAM (16GB+ recommended)

Performance Considerations

Model Selection

llama3.2:3b: Lightweight, fast, good for testing
llama3.1:8b: Balanced performance and quality
gpt-oss:20b: Higher quality, requires more resources

Resource Requirements

Component	Minimum	Recommended	Optimal
System RAM	8GB	16GB	32GB+
GPU VRAM	4GB	8GB	12GB+
Context Length	4096	8192	16384

Parallel Processing

OLLAMA_NUM_PARALLEL=4 enables 4 concurrent requests
Higher values require more memory but improve throughput
Match this value to your available resources

Additional Resources

Summary

Key Points:

✓ Set environment variables system-wide (not in Flexible GraphRAG .env)
✓ OLLAMA_NUM_PARALLEL=4 is critical for parallel processing
✓ Always restart Ollama service after changing environment variables
✓ Use OLLAMA_DEBUG=1 to troubleshoot performance issues
✓ Adjust OLLAMA_CONTEXT_LENGTH based on available resources

These settings ensure optimal Ollama performance with Flexible GraphRAG's parallel document processing capabilities.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ollama Environment Configuration

Overview

Environment Variables

Context Length

Debug Logging

Model Persistence

Maximum Loaded Models

Model Storage Directory

Parallel Request Handling

Installation Steps

Windows

Linux/macOS

Verification

Troubleshooting

Issue: Processing Errors with Multiple Files

Issue: Slow Performance

Issue: "Out of Memory" Errors

Performance Considerations

Model Selection

Resource Requirements

Parallel Processing

Additional Resources

Summary

FilesExpand file tree

OLLAMA-CONFIGURATION.md

Latest commit

History

OLLAMA-CONFIGURATION.md

File metadata and controls

Ollama Environment Configuration

Overview

Environment Variables

Context Length

Debug Logging

Model Persistence

Maximum Loaded Models

Model Storage Directory

Parallel Request Handling

Installation Steps

Windows

Linux/macOS

Verification

Troubleshooting

Issue: Processing Errors with Multiple Files

Issue: Slow Performance

Issue: "Out of Memory" Errors

Performance Considerations

Model Selection

Resource Requirements

Parallel Processing

Additional Resources

Summary