Skip to content

chrislarson1/GU-ANLY-5800

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ANLY-5800: Natural Language Processing

Updated 09/08/25

Course Information

Item Details
Course ANLY-5800
Semester Fall 2025
Instructor Chris Larson
Credits 3
Prerequisites None
Location Car Barn 309
Time Tue 3:30-6:00 pm EST
Office Hours Virtual

Course Overview

Natural language processing (NLP) lies at the heart of modern information systems. Over the last 30 years, it has transformed how humans acquire knowledge, interact with computers, and interact with other humans, multiple times over. This course presents these advancements through the lens of the machine learning methods that have enabled them. We explore how language understanding is framed as a tractable inference problem through language modeling, and trace the evolution of NLP from classical methods to the latest neural architectures, reasoning systems, and AI agents.

What's new for Fall Semester 2025?

  • Expanded focus on LLM search and retrieval.
  • Expanded focus on the practical and formal aspects of LLM reasoning and AI Agents.
  • Expanded coverage of the latest NN architectures, including non-attention based models.
  • Removed Labs, and have rolled some of that content into Assignments.

Prerequisites

While this course has no course prerequisites, it is designed for students with mathematical maturity that is typically gained through course work in linear algebra, probability theory, first order optimization methods, and basic programming. The archetypal profile is a graduate or advanced undergraduate student in CS, math, engineering, or information sciences. But there have been many exceptions; above all other indicators, students displaying a genuine interest in the material tend to excel in the course. To assist with filling any gaps in the aforementioned technical areas, I devote the entire first lecture to mathematical concepts and tools that will be used throughout the class.


Reference Texts

Many of the topics covered in this course have not been fully exposited in textbooks, and so in this course we make direct reference to papers from the literature. With that said, below are three excellent reference texts that cover a good portion of the topics in lectures 1-7.

  1. Jacob Eisenstein. Natural Language Processing
  2. Dan Jurafsky, James H. Martin. Speech and Language Processing
  3. Ian Goodfellow, Yoshua Bengio, & Aaron Courville. Deep Learning

Communication

Course content will be published to this GitHub repository, while all deliverables will be submitted through Canvas. We also have a dedicated Discord server, which is the preferred forum for all course communications. Please join our Discord server at your earliest convenience. In order for the teaching staff to associate your GU, GH, and Discord profiles, please enter your information into this table to gain access to course materials and communications.


JetStream2 Access

As part of this course, you will have access to Jupyter notebooks with A100s (40GB) hosted on the JetStream2 cluster. This is a shared resource and will be made available ahead of the first assignment.


Schedule

Date Lecture Topics Key Readings
Sep 02 Lecture 1: Mathematical Foundations - Theorems in linear algebra
- Probability and Information theory
- Statistical parameter estimation
- Zipf's law
Linear Algebra Done Right (Axler, 2015)
Elements of Information Theory (Cover & Thomas, 2006)
Mathematics for Machine Learning
Zipf's Law in Natural Language (Piantadosi, 2014)
Sep 09 Lecture 2: Decision Boundary Learning - The Perceptron
- Support Vector Machines
- Kernel methods
- Regularization and generalization theory
Pattern Recognition and Machine Learning (Bishop, 2006)
Support Vector Machines (Cortes & Vapnik, 1995)
Statistical Learning Theory (Vapnik, 1998)
Sep 16 Lecture 3: Parameter Estimation Methods - Maximum likelihood estimation
- Discriminative modeling & softmax regression
- Generative modeling & Naive Bayes
- Maximum a posteriori estimation
Machine Learning: A Probabilistic Perspective (Murphy, 2012)
Pattern Recognition and Machine Learning (Bishop, 2006)
Bayesian Data Analysis (Gelman et al., 2013)
Sep 23 Lecture 4: Distributional Semantics - TF-IDF and PMI
- Latent Semantic Analysis
- Latent Dirichlet Allocation
- Word2Vec
Probabilistic Topic Models (Blei, 2012)
Dynamic Topic Models (Blei & Lafferty, 2006)
Efficient Estimation of Word Representations (Mikolov et al., 2013)
Sep 30 Lecture 5: Neural Networks - Artificial neural networks
- Backpropagation algorithm
- Gradient descent
- Regularization methods
- Bias-variance tradeoff
- Learning rate scheduling and annealing
Deep Learning (Goodfellow et al., 2016)
Learning representations by back-propagating errors (Rumelhart et al., 1986)
Adam: A Method for Stochastic Optimization (Kingma & Ba, 2014)
Oct 07 Lecture 6: Language Modeling - n-gram models
- HMMs
- Convolutional filtering
- EBMs and Hopfield Networks
- Recurrent networks
- Autoregression
A Neural Probabilistic Language Model (Bengio et al., 2003)
A Guide to the Rabiner HMM Tutorial (Rabiner, 1989)
An Empirical Study of Smoothing Techniques (Chen & Goodman, 1999)
Energy-Based Models (LeCun et al., 2006)
Neural networks and physical systems with emergent collective computational abilities (Hopfield, 1982)
Oct 14 Lecture 7: Sequence Models - Seq2Seq models
- Bahdanau attention
- Information bottleneck
- Neural Turing Machines
- Pointer networks
- Memory networks
Sequence to Sequence Learning (Sutskever et al., 2014)
Neural Machine Translation by Jointly Learning to Align and Translate (Bahdanau et al., 2014)
The Information Bottleneck Method (Tishby et al., 1999)
Neural Turing Machines (Graves et al., 2014)
Pointer Networks (Vinyals et al., 2015)
Memory Networks (Weston et al., 2014)
Oct 21 Lecture 8: Transformers - Self-attention
- Scaled dot-product attention
- Multi-head attention
- Tokenization schemes
- Non-causal language models
- Causal language models
Attention Is All You Need (Vaswani et al., 2017)
Neural Machine Translation of Rare Words with Subword Units (Sennrich et al., 2015)
SentencePiece: A simple and language independent subword tokenizer (Kudo, 2018)
The Annotated Transformer (Rush, 2018)
Oct 28 Lecture 9: LLM Finetuning, Alignment, Scaling, and Emergence - Supervised finetuning
- Instruction tuning
- Reinforcement learning fundamentals
- RLHF and DPO
- Scaling laws and compute-optimal training
- Phase transitions and grokking
Training language models to follow instructions with human feedback (Ouyang et al., 2022)
Deep Reinforcement Learning from Human Preferences (Christiano et al., 2017)
Direct Preference Optimization (Rafailov et al., 2023)
Chinchilla Scaling Laws (Hoffmann et al., 2022)
Double Descent (Belkin et al., 2019)
Grokking: Generalization Beyond Overfitting (Power et al., 2022)
Nov 04 Lecture 10: Modern Network Architectures - Mixture of Experts
- Sparsely activated models
- State Space models
- Joint Energy-Embedding models
- Contrastive learning
Mamba: Linear-Time Sequence Modeling (Gu & Dao, 2023)
S4: State Space Layers for Sequence Modeling (Gu et al., 2021)
Switch Transformers (Fedus et al., 2021)
JEM: Joint Energy-based Models (Grathwohl et al., 2019)
SimCSE: Simple Contrastive Learning of Sentence Embeddings (Gao et al., 2021)
Nov 11 Lecture 11: Cross-Domain Applications - Vision models
- ASR and speech models
- Multi-modal models
- World models
CLIP: Learning Transferable Visual Models (Radford et al., 2021)
Robust Speech Recognition via Large-Scale Weak Supervision (Radford et al., 2022)
DALL·E 2 (Ramesh et al., 2022)
World Models (Ha & Schmidhuber, 2018)
Nov 18 Lecture 12: Reasoning - Chain-of-Thought reasoning
- Tree-of-Thought search
- Causal inference and counterfactuals
- Program synthesis and self-debugging
- Inference-time scaling
Chain-of-Thought Prompting (Wei et al., 2022)
Tree of Thoughts (Yao et al., 2023)
The Book of Why (Pearl & Mackenzie, 2018)
Reflexion: Language Agents with Verbal RL (Shinn et al., 2023)
Nov 25 Lecture 13: Search & Retrieval - Hierarchical retrieval
- Graph retrieval
- Multi-hop reasoning
- Adaptive retrieval strategies
- Contradiction detection & resolution
Retrieval-Augmented Generation (Lewis et al., 2020)
Dense Passage Retrieval (Karpukhin et al., 2020)
HotpotQA: A Dataset for Diverse, Explainable Multi-hop QA (Yang et al., 2018)
Graph RAG (Hu et al., 2024)
Dec 02 Lecture 14: Agents - Tools and function calling
- Model Context Protocol
- ReAct framework
- Experience replay and meta-learning
- Adversarial robustness
- Prompt injection attacks
ReAct: Synergizing Reasoning and Acting (Yao et al., 2022)
Toolformer (Schick et al., 2023)
Model Context Protocol (MCP)
MAML: Model-Agnostic Meta-Learning (Finn et al., 2017)
Prompt Injection Attacks (Greshake et al., 2023)
Dec 09 Lecture 15: Training and Inference Computation - Quantization
- Model compression
- Advanced attention techniques
- KV-cache optimization
- Parallelism (data/tensor/pipeline)
- Inference-time scaling
GPTQ: Accurate Post-Training Quantization (Frantar et al., 2022)
QLoRA: Efficient Finetuning (Dettmers et al., 2023)
FlashAttention 2 (Dao, 2023)
PagedAttention (Kwon et al., 2023)
Speculative Decoding (Chen et al., 2023)
Dec 16 Lecture 16: Final Project Presentations

Note: Lecture slides/notes are typically published the day of the lecture.


Deliverables

Assignment Weight Group Size Due
Assignment 1 10% individual -
Assignment 2 10% individual -
Assignment 3 10% individual -
Assignment 4 5% individual -
Exam 1 15% individual -
Exam 2 15% individual -
Exam 3 15% individual -
Final Project 20% groups ≤ 4 -

Grading Scale

Grade Cutoff ($\ge$)
A 92.5
A- 90
B+ 87
B 83
B- 80
C+ 77
C 73
C- 70
F 0

Course Policies

Attendance

Class attendance is required.

Tardy Submissions

  • Assignments submitted within 24 hours of deadline: 10% penalty
  • Assignments submitted within 48 hours of deadline: 35% penalty
  • Assignments submitted after 48 hours: not accepted without prior approval

Use of AI

  • You are encouraged to use language models as an aid in your assignments and final project. However, if a work submission contains LLM text verbatim, it will be rejected. You must submit your own work.
  • Exams are open-note, but closed-book/internet.

Academic Integrity

All submissions in this class must be your original work. Plagiarism or academic dishonesty will result in course failure and potential disciplinary action.


FAQs

I'm not sure if I should take this class. How should I decide?

If you are still deciding if anly-5800 is right for you, feedback from former students may be helpful. Over the past four years, ~200 students have taken the course and I've received enough feedback to give you the TL;DR:

  • The course has been characterized as challenging, primarily due to the breadth and depth of concepts and tools covered, many of which are new to students.

  • The course has been characterized as rewarding, with students feeling a sense of accomplishment after completing it. There have been a few common themes:

    • Students attributed improved performance in technical job interviews to this class.
    • Students mentioned new direction and insight into their own graduate research.
    • Students reported an improved ability to craft compelling research statements in graduate school applications.
  • A minority of students have provided critical feedback. There have been a few common themes:

    • Students mentioned that course material and/or instruction was overly theoretical, and not aimed at the practioner.
    • Many students have mentioned that the course was too time consuming.
    • A small number of students have opted to drop the course.

I am a law student. Can I enroll in ANLY-5800?

If you are interested in LLMs and AI, you are more than welcome to attend lectures. However, to enroll in anly-5800, you will need at least some background in the areas mentioned in the prerequisites section. Please contact me if you have a non traditional background but feel you might meet these requirements.

About

Georgetown ANLY-5800: Advanced Natural Language Processing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published