ANLY-5800: Natural Language Processing

Updated 09/08/25

Course Information

Item	Details
Course	ANLY-5800
Semester	Fall 2025
Instructor	Chris Larson
Credits	3
Prerequisites	None
Location	Car Barn 309
Time	Tue 3:30-6:00 pm EST
Office Hours	Virtual

Course Overview

Natural language processing (NLP) lies at the heart of modern information systems. Over the last 30 years, it has transformed how humans acquire knowledge, interact with computers, and interact with other humans, multiple times over. This course presents these advancements through the lens of the machine learning methods that have enabled them. We explore how language understanding is framed as a tractable inference problem through language modeling, and trace the evolution of NLP from classical methods to the latest neural architectures, reasoning systems, and AI agents.

What's new for Fall Semester 2025?

Expanded focus on LLM search and retrieval.
Expanded focus on the practical and formal aspects of LLM reasoning and AI Agents.
Expanded coverage of the latest NN architectures, including non-attention based models.
Removed Labs, and have rolled some of that content into Assignments.

Prerequisites

While this course has no course prerequisites, it is designed for students with mathematical maturity that is typically gained through course work in linear algebra, probability theory, first order optimization methods, and basic programming. The archetypal profile is a graduate or advanced undergraduate student in CS, math, engineering, or information sciences. But there have been many exceptions; above all other indicators, students displaying a genuine interest in the material tend to excel in the course. To assist with filling any gaps in the aforementioned technical areas, I devote the entire first lecture to mathematical concepts and tools that will be used throughout the class.

Reference Texts

Many of the topics covered in this course have not been fully exposited in textbooks, and so in this course we make direct reference to papers from the literature. With that said, below are three excellent reference texts that cover a good portion of the topics in lectures 1-7.

Communication

Course content will be published to this GitHub repository, while all deliverables will be submitted through Canvas. We also have a dedicated Discord server, which is the preferred forum for all course communications. Please join our Discord server at your earliest convenience. In order for the teaching staff to associate your GU, GH, and Discord profiles, please enter your information into this table to gain access to course materials and communications.

JetStream2 Access

As part of this course, you will have access to Jupyter notebooks with A100s (40GB) hosted on the JetStream2 cluster. This is a shared resource and will be made available ahead of the first assignment.

Schedule

Date	Lecture	Topics	Key Readings
Sep 02	Lecture 1: Mathematical Foundations	- Theorems in linear algebra - Probability and Information theory - Statistical parameter estimation - Zipf's law	Linear Algebra Done Right (Axler, 2015) Elements of Information Theory (Cover & Thomas, 2006) Mathematics for Machine Learning Zipf's Law in Natural Language (Piantadosi, 2014)
Sep 09	Lecture 2: Decision Boundary Learning	- The Perceptron - Support Vector Machines - Kernel methods - Regularization and generalization theory	Pattern Recognition and Machine Learning (Bishop, 2006) Support Vector Machines (Cortes & Vapnik, 1995) Statistical Learning Theory (Vapnik, 1998)
Sep 16	Lecture 3: Parameter Estimation Methods	- Maximum likelihood estimation - Discriminative modeling & softmax regression - Generative modeling & Naive Bayes - Maximum a posteriori estimation	Machine Learning: A Probabilistic Perspective (Murphy, 2012) Pattern Recognition and Machine Learning (Bishop, 2006) Bayesian Data Analysis (Gelman et al., 2013)
Sep 23	Lecture 4: Distributional Semantics	- TF-IDF and PMI - Latent Semantic Analysis - Latent Dirichlet Allocation - Word2Vec	Probabilistic Topic Models (Blei, 2012) Dynamic Topic Models (Blei & Lafferty, 2006) Efficient Estimation of Word Representations (Mikolov et al., 2013)
Sep 30	Lecture 5: Neural Networks	- Artificial neural networks - Backpropagation algorithm - Gradient descent - Regularization methods - Bias-variance tradeoff - Learning rate scheduling and annealing	Deep Learning (Goodfellow et al., 2016) Learning representations by back-propagating errors (Rumelhart et al., 1986) Adam: A Method for Stochastic Optimization (Kingma & Ba, 2014)
Oct 07	Lecture 6: Language Modeling	- n-gram models - HMMs - Convolutional filtering - EBMs and Hopfield Networks - Recurrent networks - Autoregression	A Neural Probabilistic Language Model (Bengio et al., 2003) A Guide to the Rabiner HMM Tutorial (Rabiner, 1989) An Empirical Study of Smoothing Techniques (Chen & Goodman, 1999) Energy-Based Models (LeCun et al., 2006) Neural networks and physical systems with emergent collective computational abilities (Hopfield, 1982)
Oct 14	Lecture 7: Sequence Models	- Seq2Seq models - Bahdanau attention - Information bottleneck - Neural Turing Machines - Pointer networks - Memory networks	Sequence to Sequence Learning (Sutskever et al., 2014) Neural Machine Translation by Jointly Learning to Align and Translate (Bahdanau et al., 2014) The Information Bottleneck Method (Tishby et al., 1999) Neural Turing Machines (Graves et al., 2014) Pointer Networks (Vinyals et al., 2015) Memory Networks (Weston et al., 2014)
Oct 21	Lecture 8: Transformers	- Self-attention - Scaled dot-product attention - Multi-head attention - Tokenization schemes - Non-causal language models - Causal language models	Attention Is All You Need (Vaswani et al., 2017) Neural Machine Translation of Rare Words with Subword Units (Sennrich et al., 2015) SentencePiece: A simple and language independent subword tokenizer (Kudo, 2018) The Annotated Transformer (Rush, 2018)
Oct 28	Lecture 9: LLM Finetuning, Alignment, Scaling, and Emergence	- Supervised finetuning - Instruction tuning - Reinforcement learning fundamentals - RLHF and DPO - Scaling laws and compute-optimal training - Phase transitions and grokking	Training language models to follow instructions with human feedback (Ouyang et al., 2022) Deep Reinforcement Learning from Human Preferences (Christiano et al., 2017) Direct Preference Optimization (Rafailov et al., 2023) Chinchilla Scaling Laws (Hoffmann et al., 2022) Double Descent (Belkin et al., 2019) Grokking: Generalization Beyond Overfitting (Power et al., 2022)
Nov 04	Lecture 10: Modern Network Architectures	- Mixture of Experts - Sparsely activated models - State Space models - Joint Energy-Embedding models - Contrastive learning	Mamba: Linear-Time Sequence Modeling (Gu & Dao, 2023) S4: State Space Layers for Sequence Modeling (Gu et al., 2021) Switch Transformers (Fedus et al., 2021) JEM: Joint Energy-based Models (Grathwohl et al., 2019) SimCSE: Simple Contrastive Learning of Sentence Embeddings (Gao et al., 2021)
Nov 11	Lecture 11: Cross-Domain Applications	- Vision models - ASR and speech models - Multi-modal models - World models	CLIP: Learning Transferable Visual Models (Radford et al., 2021) Robust Speech Recognition via Large-Scale Weak Supervision (Radford et al., 2022) DALL·E 2 (Ramesh et al., 2022) World Models (Ha & Schmidhuber, 2018)
Nov 18	Lecture 12: Reasoning	- Chain-of-Thought reasoning - Tree-of-Thought search - Causal inference and counterfactuals - Program synthesis and self-debugging - Inference-time scaling	Chain-of-Thought Prompting (Wei et al., 2022) Tree of Thoughts (Yao et al., 2023) The Book of Why (Pearl & Mackenzie, 2018) Reflexion: Language Agents with Verbal RL (Shinn et al., 2023)
Nov 25	Lecture 13: Search & Retrieval	- Hierarchical retrieval - Graph retrieval - Multi-hop reasoning - Adaptive retrieval strategies - Contradiction detection & resolution	Retrieval-Augmented Generation (Lewis et al., 2020) Dense Passage Retrieval (Karpukhin et al., 2020) HotpotQA: A Dataset for Diverse, Explainable Multi-hop QA (Yang et al., 2018) Graph RAG (Hu et al., 2024)
Dec 02	Lecture 14: Agents	- Tools and function calling - Model Context Protocol - ReAct framework - Experience replay and meta-learning - Adversarial robustness - Prompt injection attacks	ReAct: Synergizing Reasoning and Acting (Yao et al., 2022) Toolformer (Schick et al., 2023) Model Context Protocol (MCP) MAML: Model-Agnostic Meta-Learning (Finn et al., 2017) Prompt Injection Attacks (Greshake et al., 2023)
Dec 09	Lecture 15: Training and Inference Computation	- Quantization - Model compression - Advanced attention techniques - KV-cache optimization - Parallelism (data/tensor/pipeline) - Inference-time scaling	GPTQ: Accurate Post-Training Quantization (Frantar et al., 2022) QLoRA: Efficient Finetuning (Dettmers et al., 2023) FlashAttention 2 (Dao, 2023) PagedAttention (Kwon et al., 2023) Speculative Decoding (Chen et al., 2023)
Dec 16	Lecture 16: Final Project Presentations

Note: Lecture slides/notes are typically published the day of the lecture.

Deliverables

Assignment	Weight	Group Size	Due
Assignment 1	10%	individual	-
Assignment 2	10%	individual	-
Assignment 3	10%	individual	-
Assignment 4	5%	individual	-
Exam 1	15%	individual	-
Exam 2	15%	individual	-
Exam 3	15%	individual	-
Final Project	20%	groups ≤ 4	-

Grading Scale

Grade	Cutoff ($\ge$)
A	92.5
A-	90
B+	87
B	83
B-	80
C+	77
C	73
C-	70
F	0

Course Policies

Attendance

Class attendance is required.

Tardy Submissions

Assignments submitted within 24 hours of deadline: 10% penalty
Assignments submitted within 48 hours of deadline: 35% penalty
Assignments submitted after 48 hours: not accepted without prior approval

Use of AI

You are encouraged to use language models as an aid in your assignments and final project. However, if a work submission contains LLM text verbatim, it will be rejected. You must submit your own work.
Exams are open-note, but closed-book/internet.

Academic Integrity

All submissions in this class must be your original work. Plagiarism or academic dishonesty will result in course failure and potential disciplinary action.

FAQs

I'm not sure if I should take this class. How should I decide?

If you are still deciding if anly-5800 is right for you, feedback from former students may be helpful. Over the past four years, ~200 students have taken the course and I've received enough feedback to give you the TL;DR:

The course has been characterized as challenging, primarily due to the breadth and depth of concepts and tools covered, many of which are new to students.
The course has been characterized as rewarding, with students feeling a sense of accomplishment after completing it. There have been a few common themes:
- Students attributed improved performance in technical job interviews to this class.
- Students mentioned new direction and insight into their own graduate research.
- Students reported an improved ability to craft compelling research statements in graduate school applications.
A minority of students have provided critical feedback. There have been a few common themes:
- Students mentioned that course material and/or instruction was overly theoretical, and not aimed at the practioner.
- Many students have mentioned that the course was too time consuming.
- A small number of students have opted to drop the course.

I am a law student. Can I enroll in ANLY-5800?

If you are interested in LLMs and AI, you are more than welcome to attend lectures. However, to enroll in anly-5800, you will need at least some background in the areas mentioned in the prerequisites section. Please contact me if you have a non traditional background but feel you might meet these requirements.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
assignments		assignments
lectures		lectures
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ANLY-5800: Natural Language Processing

Course Information

Course Overview

Prerequisites

Reference Texts

Communication

JetStream2 Access

Schedule

Deliverables

Grading Scale

Course Policies

Attendance

Tardy Submissions

Use of AI

Academic Integrity

FAQs

About

Uh oh!

Releases

Packages

Languages

chrislarson1/GU-ANLY-5800

Folders and files

Latest commit

History

Repository files navigation

ANLY-5800: Natural Language Processing

Course Information

Course Overview

Prerequisites

Reference Texts

Communication

JetStream2 Access

Schedule

Deliverables

Grading Scale

Course Policies

Attendance

Tardy Submissions

Use of AI

Academic Integrity

FAQs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages