- Set up virtual environment
- Initialize Git repository
- Add
.gitignore
file
- Prepare dataset in JSONL format
- make a function to take excel into JSONL
- function that gets the sanitized name equivalent
- check how many matches we get of the QC data and the extracted text examples that we have
- find and download missing files (currently 43)
- scrap text from the finetuning examples
- formalize the tool that takes the text
- make a function to take excel into JSONL
- Fine-tune Llama 3.2 model
- Save fine-tuned model in
models/
directory
- Write inference script
- Test model predictions with sample data
- Push model to Hugging Face Hub
- Deploy model using Hugging Face Inference Endpoint
- General AI where one person can just ask questions about any of the documents
- General AI where you can ask questions about one document
- How to make general AI from structured data, that can ask questions about any of the wells
- This is likely something that has already been done
- How to prevent it from answering questions from other people's wells