|
Project Guidelines and Important Dates
Projects should be driven by an open-ended question (theory or
implementation), with a clearly
stated objective and a well-scoped plan, and must be completed in groups of three; students should
consult the project page for suggested project ideas and clearly communicate the intended deliverables
in the proposal (as best as possible). Both the project proposal and the final report must be uploaded
on Gradescope, and presentations will be in-person in the lecture hall.
Dates
- 3/25: Project proposal due (upload to Gradescope)
- 4/17: Midterm report due (upload to Gradescope)
- Student Project Presentation slots: See course dates
- 5/8: Final report due (upload to Gradescope)
Note that presenting the projects in person is mandatory. Students that are going to ICLR should plan out
their presentations for the 04/21 slot.
|
|
Gradings
Midterm report: 5%
Project Presentations: 10%
Final report due: 15%
|
|
Reports and Presentations
Presentations: Details forthcoming.
Report Format: we use NeurIPS format. You must use the NeurIPS
LaTex format.
Midterm Report: Your report should be 2 pages maximum (not including references). Your
midterm report should include title, team members, abstract, related works, problem formulation and
goals.
Final Report: Your report should be 9 pages maximum (not including references). Your
final report will be evaluated by the following criteria:
- Merit: Do you have sound reasoning for the approach? Is the question well motivated and are you
taking a justifiably simple approach or, if you are choosing a more complicated method, do you have
sound reasoning for doing this?
- Technical depth: How technically challenging was what you did? Did you use a package or write your
own code? It is fine if you use a package, though this means other aspects of your project must be
more ambitious.
- Presentation: How well did you explain what you did, your results, and interpret the outcomes? Did
you use good graphs and visualizations? How clear was the writing? Did you justify your approach?
|
|
Project Ideas
We provide a few project ideas below. Studying existing RL theory papers and reproducing proofs is
also a good option for the course project. Experiments
for verifying conclusions and testing conjectures are also welcome.
Refined analysis in Tabular MDPs: Conduct a survey on a family of tabular MDP
papers with tight regret bounds, e.g., Azar et.al ,
Jin
et.al,
Wang et.al
Comparison between variants of linear MDP models: Conduct a survey on papers with
some kind of linear structures, e.g., Yang and Wang ,
Jin et.al
Thompson Sampling in RL: Survey Thompson sampling techniques used in RL. This is a good starting point.
Gittins Index: Understand and survey the
Gittins index method. This is a framework for Bayes
optimal learning for multi-armed
bandits. Think about open questions and why extensions
are difficult. This is a good
starting point.
RL with Constraints: RL with convex and knapsack constraints is studied here for tabular settings.
Can you extend it to non-tabular setting such as linear MDPs?
RL with Adversarial Corruption: Exploration in RL with corruption is studied here.
Can you think about different attack models and study attack/defense in other RL frameworks such as policy
gradient or batch RL?
Policy Gradient: Starting from the
analysis of PG/NPG, can you think about how to do
data-reuse in policy optimization to potentially improve its sample complexity?
Policy Gradient with Exploration: Starting from PC-PG, can you think about ways to improve
its sample complexity?
Policy Gradient: Starting from this
paper, can you think about how to extend the algorithm here
to other linear MDP models?
Reward Free Exploration:
Conduct a survey on a MDP methods, which do not use a
reward signal. See Max-Ent
exploration as a starting point.
Imitation Learning from many experts: This paper shows learning from
multiple experts in the interactive learning setting. Can we do learning from multiple experts in
non-interactive settings?
Online MDPs with expert advice.
Sometimes RL can be done in adversarial contexts. Conduct a survey of
online MDP methods (in adversarial settings). See Online
MDPs as a starting point. Also, comment on the connections to the
NPG analysis.
Statistical Limits of Offline RL: Offline RL seeks
to learn a near-optimal policy from a fixed dataset. Recent work such
as Wang et al., Wang et al., and Zanette explore the
fundamental information-theoretic limits and instabilities in this
setting. Can you survey offline RL methods and the estimation
techniques used to handle distribution shift? Under what conditions
(e.g., low noise or specific coverage) can we circumvent existing
lower bounds?
Structural Assumptions and Learnability: What
structural properties of an MDP make RL tractable with function
approximation? Starting from the concept of Bellman Rank,
recent
research has unified these ideas into a broader framework of Bilinear Classes. Survey
the different structural assumptions (such as Bellman rank and
Bilinear rank) and discuss how they enable provably efficient learning
in large-scale MDPs.
Hardness of Linear Realizability: If the optimal Q-function is linear in a given
feature map, is RL always efficient? Lower bounds from Weisz et
al., Weisz et al., and Wang et al. suggest otherwise, even with a constant
suboptimality gap. An interesting open question is: if $Q^\pi$ is linear for all policies, is an online
lower bound still possible, or does this make the problem tractable? Additionally, explore the
near-deterministic case and its impact on learnability.
|
|