GitHub - notkisk/llama-2: A from-scratch PyTorch implementation of Llama 2, featuring Rotary Positional Embeddings (RoPE) and KV caching

llama 2

so i decided to build llama 2 from scratch just to see how it works. turns out it's mostly just matmul and stacking layers.

learned a ton about:

how attention actually functions under the hood
rotary positional embeddings (math is heavy but cool)
kv caching for inference speed

also did some yapping in the comments about general intelligence and the cia. building this makes me realize agi is far away and i should probably just get a job.

anyway, use prepare.sh to get weights if you have the meta url. run inference.py to chat with it.

code is what it is. enjoy.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
__pycache__		__pycache__
README.md		README.md
inference.py		inference.py
model.py		model.py
prepare.sh		prepare.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llama 2

About

Uh oh!

Releases

Packages

Languages

notkisk/llama-2

Folders and files

Latest commit

History

Repository files navigation

llama 2

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages