D&D Rules RAG Pipeline

A Retrieval-Augmented Generation (RAG) pipeline using LangChain 1.0 to answer questions about D&D rules from markdown documents processed via olmOCR.

Features Covered

Document Indexing: Load markdown files, split into chunks, and store embeddings in Qdrant.
RAG Agent: Flexible multi-step queries using create_agent with a retrieval tool.
RAG Chain: Fast single-call Q&A using middleware for context injection.
LangSmith Tracing: Built-in observability for debugging and monitoring.

Pre-Reqs

You'll need to ensure you have uv installed before proceeding.

# Install dependencies
uv sync

You will also need an OpenAI API Key (OPENAI_API_KEY) and optionally a LangSmith API Key (LANGSMITH_API_KEY) for tracing.

Preparing Documents with olmOCR

Before running the RAG pipeline, convert your PDF rulebooks to markdown using olmOCR:

# Pull the Docker image (large, includes the model, ~30GB)
docker pull alleninstituteforai/olmocr:latest-with-model

# Convert PDFs to markdown
docker run --gpus all \
  -v "$(pwd)":/workspace \
  alleninstituteforai/olmocr:latest-with-model \
  -c "python -m olmocr.pipeline /workspace/output --markdown --pdfs /workspace/PDFs/*.pdf"

Place the generated markdown files in the PDFs/ directory.

Running the Example

Launch the Notebook:

uv run jupyter notebook RAG_Pipeline.ipynb

Run the Cells:
- Set up your environment and API keys.
- Index your D&D rulebook markdown files into Qdrant.
- Test the RAG Agent for flexible, multi-step queries.
- Test the RAG Chain for fast, single-call Q&A.
- Use the interactive demo to ask your own questions!

Project Structure

RAG_Pipeline.ipynb: The main interactive RAG pipeline notebook.
PDFs/: Directory containing markdown files (converted from PDFs via olmOCR).
pyproject.toml: Dependency management.

Key Components

Component	Purpose
`DirectoryLoader`	Load markdown files from disk
`RecursiveCharacterTextSplitter`	Split documents into retrievable chunks
`QdrantVectorStore`	Store and search embeddings
`@tool` decorator	Create retrieval tool for agent
`create_agent`	Build LangChain 1.0 agent
`AgentMiddleware`	Inject context for RAG chain

Credits

Built using LangChain, LangGraph, and olmOCR.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
PDFs		PDFs
output		output
.gitignore		.gitignore
.python-version		.python-version
RAG_Pipeline.ipynb		RAG_Pipeline.ipynb
README.md		README.md
chicken-scratch.png		chicken-scratch.png
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

D&D Rules RAG Pipeline

Features Covered

Pre-Reqs

Preparing Documents with olmOCR

Running the Example

Project Structure

Key Components

Credits

About

Uh oh!

Releases

Packages

Languages

AI-Maker-Space/olmOCR-RAG

Folders and files

Latest commit

History

Repository files navigation

D&D Rules RAG Pipeline

Features Covered

Pre-Reqs

Preparing Documents with olmOCR

Running the Example

Project Structure

Key Components

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages