2025 Summer School @ Pekin University – Foundations of Reinforcement Learning
A rigorous introduction to sequential decision-making, from Markov decision processes to deep reinforcement learning.
What is Reinforcement Learning?
Reinforcement learning (RL) is a computational framework for decision-making under uncertainty. An RL agent learns to interact with its environment by trial and error, aiming to maximize long-term reward.
From early successes in games to modern applications in robotics, healthcare, and education, RL provides a foundation for building intelligent systems that learn from experience.
A Principled Approach
At the heart of reinforcement learning lies the Markov decision process (MDP): a formal model describing how agents, states, actions, and rewards evolve over time.
This course builds from first principles, covering:
- Transition dynamics and reward models
- Policies and value functions
- The Bellman equations
- Monte Carlo and temporal-difference methods
- Policy optimization techniques
We emphasize clarity, rigor, and the connections between theory and practice.
Topics Covered
- What is an MDP?
- Policies and Interaction Protocols
- Value Functions and the Bellman Equations
- Monte Carlo and TD Learning
- Function Approximation
- Policy Gradient and Actor-Critic Methods
- Exploration, Generalization, and Safety
- Real-World Challenges and Applications
Lecture Series Overview
10 Lectures – 2 Hours Each
Week 1: Theoretical Foundations
Week 2: Algorithms and Applications
Syllabus
An overview of the topics can be found here: Overview
Lectures | Files | Topics |
Lecture 1 | Notes, Slides | What is RL? · MDP components · Agent-environment interaction · Markov property · Policies |
Lecture 2 | Notes | Returns and task types · RL objective · Adequacy of Markov policies · Value functions · Bellman equations |
Lecture 3 | Dynamic programming Policy evaluation and improvement Value and policy iteration | |
Lecture 4 | Multi-armed bandits Exploration vs exploitation Regret, ε-greedy, UCB | |
Lecture 5 | Monte Carlo methods First-visit and every-visit estimation Monte Carlo control | |
Lecture 6 | Temporal-difference learning TD(0), SARSA, Q-learning | |
Lecture 7 | Function approximation Linear methods Semi-gradient TD | |
Lecture 8 | Policy gradient methods REINFORCE Variance reduction | |
Lecture 9 | Actor-Critic methods Deep RL: instability, tricks Replay buffers | |
Lecture 10 | Advanced topics: Safe RL, offline RL, AlphaZero Applications and open challenges |