Skip to content

A from-scratch PyTorch implementation of Llama 2, featuring Rotary Positional Embeddings (RoPE) and KV caching

Notifications You must be signed in to change notification settings

notkisk/llama-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llama 2

so i decided to build llama 2 from scratch just to see how it works. turns out it's mostly just matmul and stacking layers.

learned a ton about:

  • how attention actually functions under the hood
  • rotary positional embeddings (math is heavy but cool)
  • kv caching for inference speed

also did some yapping in the comments about general intelligence and the cia. building this makes me realize agi is far away and i should probably just get a job.

anyway, use prepare.sh to get weights if you have the meta url. run inference.py to chat with it.

code is what it is. enjoy.

About

A from-scratch PyTorch implementation of Llama 2, featuring Rotary Positional Embeddings (RoPE) and KV caching

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published