|
CS 2824: Foundations of Reinforcement Learning
Modern Artificial Intelligent (AI) systems often need the ability to make sequential decisions in an unknown,
uncertain, possibly hostile environment, by actively interacting with the environment to collect relevant data.
Reinforcement Learning (RL) is a general framework that can capture the interactive learning setting and
has been used to design intelligent agents that achieve super-human level performances on
challenging tasks such as Go, computer games, and robotics manipulation.
This graduate level course focuses on theoretical and algorithmic foundations of Reinforcement Learning. The four main themes of the course are
(1) fundamentals (MDPs, computation, statistics,
generalization) (2) provably efficient exploration (and
high dimensional RL) (3) direct policy optimization
(e.g. policy gradient methods).
After taking this course, students will be able to
understand both classic and state-of-art provably correct
RL algorithms and their analysis. Students will be able to
conduct research on RL related topics.
|
|
Staff
Instructors: Kianté Brantley and Sham Kakade
TFs: Lukas Fesser, Jaeyeon Kim, and Alex Meterez.
Lecture time: Tuesday/Thursday 12:45-2p
Office hours: By Appointment
Location: SEC LL2.224
Contact:
Please communicate to the instructors and TFs only
through the Ed account. Emails not sent to this list, with regards to the course,
will not be responded to in a timely manner.
Announcements:
Course announcements will be made via Canvas
and Edstem. It is the students' responsibility to follow both.
|
|
Prerequisites
This is an advanced and theory-heavy course: there is no programming assignment and students
are required to work on a theory-focused course project. Students need a strong grasp on Machine Learning, Probability and Statistics, Optimization, and Linear Algebra. For undergraduate and masters students enrollment: permission of
instructor required through course petition.
|
|
Grading Policies
Assignments 60 Homework%, Project 30%, Reading 10%, (+Participation bonus 5%)
All homework will be mathematical in nature, focussing on the theory of RL and bandits;
there will not be a programming component.
The entire HW must be submitted in one single typed pdf document (not handwritten).
HW0 is MANDATORY to pass to satisfactory level;
it is to check your knowledge of the prerequisites in probability, statistics, and linear algebra.
Homework Rules:
Homework must be done individually: each student must understand, write, and hand in their own answers. It is
acceptable for students to discuss problems with each other;
it is not acceptable for students to share answers and look at another students written answers.
You must also indicate on each homework with whom you
collaborated with and what online resources you used. You
must attempt and submit all HW (even if it is for 0 credit)
in order to pass the class.
Late days: Homeworks and Reading
Assignments must be submitted by the posted due date.
You are allowed up to 6 total LATE DAYs for the
homeworks and reading assignments throughout the entire semester. These will be automatically deducted if your assignment is late.
For example, any day in which an assignment is late by up to 24 hours,
then one late day will be used. After your late days are used up,
late penalties will be applied: any assignment turned in late will incur a reduction in score by 33% for each late day,
so if an assignment is up to 24 hours late, it incurs a penalty of 33%.
Else if it is up to 48 hours late, it incurs a penalty of 66%.
And any longer, it will receive no credit. We will track all your late days and any deductions will be applied in computing the final grades.
If you are unable to turn in HWs on time, aside from permitted days, then do not enroll in the course.
Regrading: If we made a mistake,
you must let us know (in writing via Ed) within a week
of when the HW was returned.
Participation/extra effort
bonus: We encourage participation including
asking/answering questions in lectures and ED
discussion, and extra effort on reading the book
chapters (e.g., proof reading additional chapters and
sending back comments/feedback).
|
|
Reading Assignment
Reading assignments are meant to be completed actively
and carefully. Student are responsible for reading
the assigned readings in "Reinforcement Learning Theory and
Algorithms" (ABJKS pdf link here) and
engaging with the text to support learning. Note that
LATE DAY POLICY also applies to the reading assignments.
The readings are intended to help you develop a strong,
working mastery of the material.
Students are encouraged to use ChatGPT (or another LLM tool) for all reading assignments. Harvard-enrolled students should have access to a university-sponsored account.
|
|
Course Project
Please see the course projects from ideas from an
older version of the course page.
Students will do project presentations during the last
three lectures of the course.
It is a course requirement that you be in attendance for
all student presentations. See the dates
below. Only the dates of 04/23/26 and 04/28/26 will be
excused for ICLR, with instructor permission.
|
|
Diversity in STEM
While many academic disciplines have historically been dominated by one cross section of society,
the study of and participation in STEM disciplines is a joy that the instructors hope that everyone can pursue,
regardless of their socio-economic background, race, gender, etc.
The instructors encourage students to both be mindful of these issues, and,
in good faith, try to take steps to fix them. You are the next generation here.
|
|
Course Notes: RL Theory and Algorithms
The course will be largely based of the working draft of
the book "Reinforcement Learning Theory and
Algorithms".
We will be updating the notes in ABJKS
throughout the course of the term. If you find typos or errors, please let us
know. We would appreciate it!
|
|
Tentative Dates (see Ed for announcements)
HW0: Due 01/30
HW1: Out 02/05, Due 02/16
HW2: Out 02/24, Due 03/13
HW3: Out 03/24, Due 04/08
|
|
Schedule (tentative)
|
|
Lecture |
Reading |
Slides/HW |
| 01/27/26 |
|
Fundamentals: Markov Decision Processes |
Ch.1 |
Slides,
Annotated slides
|
| 01/29/26 |
|
Fundamentals: Value Iteration |
Ch.1 |
Slides,
Annotated slides
|
| 02/03/26 |
|
Fundamentals: Policy Iteration and LP-Formulation |
Ch.1 |
Slides,
|
| 02/05/26 |
|
Fundamentals: Tabular MDP with a Generative Model |
Ch.2 |
,
|
| 02/10/26 |
|
Fundamentals: Linear functions w/ Generative model |
Ch.3 |
,
,
|
| 02/12/26 |
|
Fundamentals: Linear Bellman Completeness |
Ch.3 |
,
|
| 02/17/26 |
|
Exploration: Multi-armed Bandits |
Ch.5 |
,
|
| 02/19/26 |
|
Exploration: Efficient Exploration in Tabular MDPs |
Ch.6 |
,
|
| 02/24/26 |
|
Exploration: Linear Bandits |
Ch.5 |
,
|
| 02/26/26 |
|
Exploration: Efficient Exploration in Linear MDPs |
Ch.7 |
,
|
| 03/03/26 |
|
Exploration: Information
Theoretic Lower Bounds |
Ch.10 |
,
|
| 03/05/26 |
|
Exploration: RL w/ function approximation |
Ch.8 |
,
|
| 03/10/26 |
|
Exploration: RL w/ function approximation (continued) |
|
,
|
| 03/12/26 |
|
TBD |
|
|
| 03/17/26 |
|
Spring Recess |
|
|
| 03/19/26 |
|
Spring Recess |
|
|
| 03/24/26 |
|
Policy Optimization: Policy Gradient |
Ch.11
& 12 |
,
|
| 03/26/26 |
|
Policy Optimization: Natural Policy Gradient and TRPO |
Ch.12
& 13 |
,
|
| 03/31/26 |
|
Policy Optimization: Global optimality of PG and NPG |
Ch.13 |
,
|
| 04/02/26 |
|
Policy Optimization: Conservative Policy Iteration and Function Approximation |
Ch.14 |
,
|
| 04/07/26 |
|
Policy Optimization: NPG and Proximal Policy Optimization |
Ch.14 |
,
|
| 04/09/26 |
|
(TBD) RLHF: Contextual Bandits and BT model and DPO and REBEL |
Paper 1,
Paper 2
|
| 04/14/26 |
|
Guest Lecture Wen Sun RLVR: (TBD) |
,
|
|
| 04/16/26 |
|
Guest Lecture Gabriel Poesia Reis e Silva : (TBD) |
|
|
| 04/21/26 |
|
Student Project Presentations |
|
|
| 04/23/26 |
|
Student Project Presentations |
|
|
| 04/28/26 |
|
Student Project Presentations |
|
|
|
|