PromptWeaver: RAG Edition helps design effective prompts for Traditional, Hybrid, and Agentic RAG systems. It offers templates, system prompts, and best practices to improve accuracy, context use, and LLM reasoning.
To improve accuracy, relevance, and explainability in RAG and Agentic RAG responses through structured and optimized prompt construction.
Context:
"""
{{ retrieved_passages }}
"""
Question:
{{ user_query }}
[Heuristic-Summary]: {{ context_summary }}
Context:
"""
{{ top_retrieved_docs }}
"""
User Query:
{{ query }}
[Agent Memory]: {{ memory_state }}
[Task Plan]: {{ agent_plan }}
Fetched Context:
"""
{{ selected_documents }}
"""
User Query:
{{ user_query }}
System Prompt:
{{ system_guidance }}
- ✅ Keep context concise (avoid overwhelming LLM input limits)
- ✅ Use delimiters (like """ or brackets) for clarity
- ✅ Separate user intent from supporting facts
- ✅ Limit redundancy in retrieved documents
- ✅ Include reasoning expectations in the system prompt
Answer the question using only the provided context. If unsure, say "Not enough information."
You are an AI assistant with access to tools, memory, and planning capability. Break down the query, fetch what’s needed, and explain your process.
- A/B test different retrieval depths (top-3 vs top-5)
- Use confidence scoring with LLM responses
- Log failures and study response hallucinations
- Tune memory injection strategies
- OpenAI Cookbook: Prompt Engineering Examples
- DeepLearning.AI: Prompting for LLMs Course
- LangChain Docs on prompt templates
A large telecom company deploys a customer support chatbot powered by RAG to help users troubleshoot internet issues, explain bills, and update plans using internal documentation.
- Query: “Why is my bill higher this month?”
- Context: Retrieved from billing FAQ and promo policy.
Context:
"""
Billing for promo plans changes after 6 months. Extra charges apply for over-usage.
"""
Question:
Why is my bill higher this month?
- LLM Output: “Your bill may be higher due to promo expiry or extra data usage.”
- Enriches context with heuristics: “Promo expired Jan 2024.”
- Agent Plan:
- Access billing API
- Fetch promo status
- Check over-usage
[Agent Memory]: Previous overcharge discussion
[Task Plan]: Fetch user billing for Jan, check promo status
Fetched Context:
"""
User’s promo expired Dec 31. Data overage of 5GB was billed.
"""
Question:
Why is my bill higher this month?
- Final Response: “Your promo ended in Dec, and 5GB of extra data in Jan led to additional charges.”
Building RAG systems—especially Agentic ones—raises key ethical concerns:
- Bias Propagation: LLMs may amplify bias present in retrieved documents.
- Data Privacy: Long-term memory and context logs may expose user data.
- Tool Misuse: Autonomous agents may make unintended API calls.
- Hallucinations: Confidently wrong answers can mislead users.
- Apply content filters and bias testing
- Anonymize or redact user inputs
- Monitor and log agent behavior
- Include disclaimers for uncertain output
This project is licensed under the MIT License. You are free to use, modify, and distribute the code with proper attribution. See LICENSE
file for details.
RAG frameworks come in three flavors: Traditional, Hybrid, and Agentic. Here's how they differ architecturally:
- Vector Indexer: Converts docs to embeddings and stores in a vector DB (e.g., FAISS, Qdrant)
- Retriever: Fetches relevant documents using semantic similarity
- Prompt Augmenter: Merges context with the user query
- Agent Layer (Agentic only): Plans tool usage, manages memory, and orchestrates steps
- LLM Interface: Generates responses based on the final prompt
User Query → Vector Search → Augmented Prompt → LLM → Response
User Query → Vector Search → Heuristic Filter → Augmented Prompt → LLM
User Query → Agent → Tool Selection & Retrieval → Prompt Assembly → LLM
rag-architecture/
├── /src
│ ├── traditional/ # Basic RAG logic
│ ├── hybrid/ # Rule-enhanced retrieval
│ └── agentic/ # Agent, planner, memory
├── /data # Corpus, vector store
├── /docs # Design, prompts, ethics
└── /tests # Unit tests, benchmarks
- LangChain / LlamaIndex for RAG orchestration
- FAISS / Qdrant for vector search
- OpenAI / Claude / Gemini as LLMs
- Docker / GitHub Actions for deployment and CI/CD
- Python / TypeScript as implementation languages
- Prompt Engineering for optimized LLM input
Create a board with columns and sample issues:
- Backlog: Define agent schema, Create prompt libraries, Setup retrieval eval framework
- To Do: Add support for hybrid heuristics, Configure Qdrant vector store
- In Progress: Agent planner logic, Context chunk size tuning
- Review: Prompt output logging, Agent retry logic
- Done: Traditional RAG baseline working, Basic UI for prompt testing
Estimating infrastructure and tooling costs helps plan and scale a RAG system responsibly. Here’s a high-level breakdown:
Resource | Cost (USD) | Notes |
---|---|---|
OpenAI API (GPT-4) | $100–$300 | Based on token usage for inference |
Vector DB (Qdrant/FAISS on cloud) | $20–$80 | For storing embeddings |
Compute (Docker, Agents, API) | $50–$150 | On cloud (e.g., AWS EC2, Azure VM) |
Storage (object/docs) | $10–$30 | S3, Azure Blob, or equivalent |
Monitoring & Logging | $0–$50 | Optional tools like Prometheus, Grafana |
CI/CD (GitHub Actions) | Free–$30 | Based on usage |
DevOps & Maintenance | $0–$100 | Time/labor if outsourced |
Total Estimated Monthly Cost: $180 – $740
🔎 Tip: Use open-source LLMs (e.g., Mistral, LLaMA) or local vector stores to reduce cost.
- Automate prompt logging and quality scoring
- Create a library of reusable prompts for standard tasks
- Evaluate across domains (FAQ bots, tech support, education)