2025 Summer School @ Pekin University – Foundations of Reinforcement Learning
A rigorous introduction to sequential decision-making, from Markov decision processes to deep reinforcement learning.
What is Reinforcement Learning?
Reinforcement learning (RL) is a computational framework for decision-making under uncertainty. An RL agent learns to interact with its environment by trial and error, aiming to maximize long-term reward.
From early successes in games to modern applications in robotics, healthcare, and education, RL provides a foundation for building intelligent systems that learn from experience.
A Principled Approach
At the heart of reinforcement learning lies the Markov decision process (MDP): a formal model describing how agents, states, actions, and rewards evolve over time.
This course builds from first principles, covering:
- Transition dynamics and reward models
- Policies and value functions
- The Bellman equations
- Monte Carlo and temporal-difference methods
- Policy optimization techniques
We emphasize clarity, rigor, and the connections between theory and practice.
Topics Covered
- What is an MDP?
- Policies and Interaction Protocols
- Value Functions and the Bellman Equations
- Monte Carlo and TD Learning
- Function Approximation
- Policy Gradient and Actor-Critic Methods
- Exploration, Generalization, and Safety
- Real-World Challenges and Applications
Lecture Series Overview
10 Lectures – 2 Hours Each
Week 1: Theoretical Foundations
Week 2: Algorithms and Applications
Syllabus
An overview of the topics can be found here: Overview
Lectures | Files | Topics |
Lecture 1 | Notes, Slides | What is RL? · MDP components · Agent-environment interaction · Markov property · Policies |
Lecture 2 | Notes | Discounted return · Task types · RL objective · Occupancy measures · Adequacy of Markov policies |
Lecture 3 | Notes | Value functions · Bellman equations · Policy evaluation · Policy improvement |
Lecture 4 | Notes | Value and policy iteration · Multi-armed bandits · Exploration vs exploitation · Regret, ε-greedy, UCB |
Lecture 5 | Monte Carlo methods · First-visit and every-visit estimation · Monte Carlo control | |
Lecture 6 | Temporal-difference learning · TD(0) · SARSA · Q-learning | |
Lecture 7 | Function approximation · Linear methods · Semi-gradient TD | |
Lecture 8 | Policy gradient methods · REINFORCE · Variance reduction | |
Lecture 9 | Actor-Critic methods · Deep RL: instability, tricks Replay buffers | |
Lecture 10 | Advanced topics: Safe RL, offline RL, AlphaZero Applications and open challenges |