-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Best way to add knowledge to a LLM: r/LocalLLaMA
DESCRIPTION: Studies like this one show GPT4 gets 75% accuracy on prompting alone. GPT4 + RAG you get 80% accuracy. GPT4 + Finetuning 81%. GPT4 + RAG + Finetuning = 86%. Other studies like this one say just for knowledge retrieval from huge datasets, RAG is enough.
Kaggle's LLM Science Exam competition link made participants answer hard science questions. The winning solution showed Llama-2 70b with prompting gets 80%. + finetuning via SFT you get 86%. But + finetuning + RAG you get 93%. All had to undergo finetuning since the output was MMLU's classification type ie output A, B, C, D etc (so a classification problem).
I would use RAG as a first try to see if it can work. Now the issue is which embeddings, which database etc. Chunk size, reranking etc.
If you find RAG to be quite annoying to set up, another approach is to shove your dataset for finetuning. It'll become a text completion model, so you might need say GPT4 to create some instructions from the dataset to "prime" your model.
So RAG definitely works, pushing accuracies from 75% to 80%. But + finetuning you get 86%. There are some bad theories spreading finetuning does not inject new knowledge, but these studies and the Kaggle comp prove otherwise.
Likewise see Open Hermes, and any finetuned model - finetuning is just continuous pretraining. Definitely the weights of the model are being edited to account for more information.
I'm also the dev of Unsloth :) If you're going to do finetuning, I have a free Colab notebook to finetune Mistral 7b 2x faster and use 70% less VRAM. Colab Notebook
All in all, I would try first prompt engineering, then RAG, then finetuning, then RAG + finetuning as the final step.
URL: r/LocalLLaMA