Quick PoC for indexing/embedding your codebase into PostgreSQL + Langgraph for querying.
You'll need to add your own summary.txt file, to help the query-decomposer to identify what this codebase is even about!
Test with:
python -Wignore embed.py --git_url "https://github.com/sooperset/mcp-atlassian.git" --output_dir ./output --to_embedding --to_postgres --load_on_startup --batch_size 5
python retrieve.py --query "Does the Jira Tool have a tool to get project issues within a board" --codebase_name "sooperset/mcp-atlassian" --codebase_summary_path "./summary.txt"
NOTE: Since we plan to use query-decomposition, we will likely grab almost 30 chunks for a single hop. Using MMR, we truncate this down to 5. Please keep the size of the chunks low! They help in ensuring a diversity in the questions in the first place. I've set it to 2048 token-size-per-chunk, purely because of the I/O slowdown when committing too many chunks into Postgres.