This repository holds a lecture video search and question answering system, which is based on retrieval-augmented generation. It is also the artifact of my master's thesis and the paper "Enhancing Question Answering in Lecture Videos with a Multimodal Retrieval-Augmented Generation Framework", which was accepted at AIMSA 2024 and published in Springer: https://link.springer.com/chapter/10.1007/978-3-031-81542-3_15
srccontains the RAG logic.devcontains a Jupyter notebook that uses logic insrcand demonstrates a simple use case. It also contains theqa-dataset.pyfile which was used as question answering dataset for the research project, as well as the hyperparameters we found to be optimal for our dataset.
- Ensure you have the package manager pdm installed: https://pdm-project.org/en/latest/ (on MacOS:
brew install pdm) - Optional: If you want to use tesseract as OCR library, instead of the default easyocr, install the necessary tesseract system dependencies (see below)
- Ensure Python 3.11 is installed (with pdm:
pdm python install 3.11) - Follow Llama-CPP-Python instructions (see below)
- Run
pdm install -dto install all dependencies - Rename the
example.envfile in thedevfolder to.envand fill in the necessary environment variables - Run the code in
dev/rag_example.ipynbto see how to use the RAG logic
- Tesseract: https://github.com/tesseract-ocr/tesseract?tab=readme-ov-file#installing-tesseract
- ffmpeg: https://ffmpeg.org/download.html
- In case your system has a CUDA compatible GPU:
CUDACXX=/usr/local/cuda-12/bin/nvcc CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=all-major" FORCE_CMAKE=1beforepdm install(https://medium.com/@ryan.stewart113/a-simple-guide-to-enabling-cuda-gpu-support-for-llama-cpp-python-on-your-os-or-in-containers-8b5ec1f912a4) - In case you have a macOS M-family GPU (supporting MPS):
CMAKE_ARGS="-DLLAMA_METAL=on"beforepdm install
- Watch CUDA GPU processes:
watch -n 1 nvidia-smi - Watch CPU processes:
htop
