Talks and Seminars
This is a list of recent talks and seminars.
2024
- 2024-11-19: Nonparametric Analysis of Dynamical Systems: From Recurrent Sets to Generalized Lyapunov and Barrier Conditions, ESE FAll Colloquium.
[BibTeX] [Abstract] [Download PDF]
Integrating Reinforcement Learning (RL) in safety-critical applications, such as autonomous vehicles, healthcare, and industrial automation, necessitates an increased focus on safety and reliability. In this talk, we consider two complementary mechanisms to augment RL’s suitability for safety-critical systems. Firstly, we consider a constrained reinforcement learning (C-RL) setting, wherein agents aim to maximize rewards while adhering to required constraints on secondary specifications. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. However, such methods exhibit a discrepancy between the behavioral and optimal policies due to their reliance on stochastic gradient descent-ascent algorithms. We propose a novel algorithm for constrained RL that does not suffer from these limitations. Leveraging recent results on regularized saddle-flow dynamics, we develop a novel stochastic gradient descent-ascent algorithm whose trajectories almost surely converge to the optimal policy. Secondly, we study the problem of incorporating safety-critical constraints to RL that allow an agent to avoid (unsafe) regions of the state space. Though such a safety goal can be captured by an action-value-like function, a.k.a. safety critics, the associated operator lacks the desired contraction and uniqueness properties that the classical Bellman operator enjoys. In this work, we overcome the non-contractiveness of safety critic operators by leveraging that safety is a binary property. To that end, we study the properties of the binary safety critic associated with a deterministic dynamical system that seeks to avoid reaching an unsafe region. We formulate the corresponding binary Bellman equation (B2E) for safety and study its properties. While the resulting operator is still non-contractive, we fully characterize its fixed points representing–except for a spurious solution–maximal persistently safe regions of the state space that can always avoid failure. We provide an algorithm that, by design, leverages axiomatic knowledge of safe data to avoid spurious fixed points.
@talk{upenn24, abstract = {Integrating Reinforcement Learning (RL) in safety-critical applications, such as autonomous vehicles, healthcare, and industrial automation, necessitates an increased focus on safety and reliability. In this talk, we consider two complementary mechanisms to augment RL's suitability for safety-critical systems. Firstly, we consider a constrained reinforcement learning (C-RL) setting, wherein agents aim to maximize rewards while adhering to required constraints on secondary specifications. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. However, such methods exhibit a discrepancy between the behavioral and optimal policies due to their reliance on stochastic gradient descent-ascent algorithms. We propose a novel algorithm for constrained RL that does not suffer from these limitations. Leveraging recent results on regularized saddle-flow dynamics, we develop a novel stochastic gradient descent-ascent algorithm whose trajectories almost surely converge to the optimal policy. Secondly, we study the problem of incorporating safety-critical constraints to RL that allow an agent to avoid (unsafe) regions of the state space. Though such a safety goal can be captured by an action-value-like function, a.k.a. safety critics, the associated operator lacks the desired contraction and uniqueness properties that the classical Bellman operator enjoys. In this work, we overcome the non-contractiveness of safety critic operators by leveraging that safety is a binary property. To that end, we study the properties of the binary safety critic associated with a deterministic dynamical system that seeks to avoid reaching an unsafe region. We formulate the corresponding binary Bellman equation (B2E) for safety and study its properties. While the resulting operator is still non-contractive, we fully characterize its fixed points representing--except for a spurious solution--maximal persistently safe regions of the state space that can always avoid failure. We provide an algorithm that, by design, leverages axiomatic knowledge of safe data to avoid spurious fixed points.}, date = {11/20/2024}, day = {19}, event = {ESE FAll Colloquium}, host = {Rene Vidal}, month = {11}, role = {Speaker}, title = {Nonparametric Analysis of Dynamical Systems: From Recurrent Sets to Generalized Lyapunov and Barrier Conditions}, url = {https://mallada.ece.jhu.edu/talks/202411-UPenn.pdf}, year = {2024} }
- 2024-09-25: Generalized Barrier Functions: Integral Conditions and Recurrent Relaxations, 60th Allerton Conference on 60th Allerton Conference on Communication, Control, and Computing.
[BibTeX] [Abstract] [Download PDF]
Integrating Reinforcement Learning (RL) in safety-critical applications, such as autonomous vehicles, healthcare, and industrial automation, necessitates an increased focus on safety and reliability. In this talk, we consider two complementary mechanisms to augment RL’s suitability for safety-critical systems. Firstly, we consider a constrained reinforcement learning (C-RL) setting, wherein agents aim to maximize rewards while adhering to required constraints on secondary specifications. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. However, such methods exhibit a discrepancy between the behavioral and optimal policies due to their reliance on stochastic gradient descent-ascent algorithms. We propose a novel algorithm for constrained RL that does not suffer from these limitations. Leveraging recent results on regularized saddle-flow dynamics, we develop a novel stochastic gradient descent-ascent algorithm whose trajectories almost surely converge to the optimal policy. Secondly, we study the problem of incorporating safety-critical constraints to RL that allow an agent to avoid (unsafe) regions of the state space. Though such a safety goal can be captured by an action-value-like function, a.k.a. safety critics, the associated operator lacks the desired contraction and uniqueness properties that the classical Bellman operator enjoys. In this work, we overcome the non-contractiveness of safety critic operators by leveraging that safety is a binary property. To that end, we study the properties of the binary safety critic associated with a deterministic dynamical system that seeks to avoid reaching an unsafe region. We formulate the corresponding binary Bellman equation (B2E) for safety and study its properties. While the resulting operator is still non-contractive, we fully characterize its fixed points representing–except for a spurious solution–maximal persistently safe regions of the state space that can always avoid failure. We provide an algorithm that, by design, leverages axiomatic knowledge of safe data to avoid spurious fixed points.
@talk{allerton24, abstract = {Integrating Reinforcement Learning (RL) in safety-critical applications, such as autonomous vehicles, healthcare, and industrial automation, necessitates an increased focus on safety and reliability. In this talk, we consider two complementary mechanisms to augment RL's suitability for safety-critical systems. Firstly, we consider a constrained reinforcement learning (C-RL) setting, wherein agents aim to maximize rewards while adhering to required constraints on secondary specifications. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. However, such methods exhibit a discrepancy between the behavioral and optimal policies due to their reliance on stochastic gradient descent-ascent algorithms. We propose a novel algorithm for constrained RL that does not suffer from these limitations. Leveraging recent results on regularized saddle-flow dynamics, we develop a novel stochastic gradient descent-ascent algorithm whose trajectories almost surely converge to the optimal policy. Secondly, we study the problem of incorporating safety-critical constraints to RL that allow an agent to avoid (unsafe) regions of the state space. Though such a safety goal can be captured by an action-value-like function, a.k.a. safety critics, the associated operator lacks the desired contraction and uniqueness properties that the classical Bellman operator enjoys. In this work, we overcome the non-contractiveness of safety critic operators by leveraging that safety is a binary property. To that end, we study the properties of the binary safety critic associated with a deterministic dynamical system that seeks to avoid reaching an unsafe region. We formulate the corresponding binary Bellman equation (B2E) for safety and study its properties. While the resulting operator is still non-contractive, we fully characterize its fixed points representing--except for a spurious solution--maximal persistently safe regions of the state space that can always avoid failure. We provide an algorithm that, by design, leverages axiomatic knowledge of safe data to avoid spurious fixed points.}, date = {06/12/2024}, day = {25}, event = {60th Allerton Conference on 60th Allerton Conference on Communication, Control, and Computing}, host = {N/A}, month = {09}, role = {Speaker}, title = {Generalized Barrier Functions: Integral Conditions and Recurrent Relaxations}, url = {https://mallada.ece.jhu.edu/talks/202409-Allerton.pdf}, year = {2024} }
- 2024-06-12: Reinforcement Learning for Safety Critical Applications, Tercera Conferencia Colombiana de Matematicas Aplicadas e Industriales.
[BibTeX] [Abstract] [Download PDF]
Integrating Reinforcement Learning (RL) in safety-critical applications, such as autonomous vehicles, healthcare, and industrial automation, necessitates an increased focus on safety and reliability. In this talk, we consider two complementary mechanisms to augment RL’s suitability for safety-critical systems. Firstly, we consider a constrained reinforcement learning (C-RL) setting, wherein agents aim to maximize rewards while adhering to required constraints on secondary specifications. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. However, such methods exhibit a discrepancy between the behavioral and optimal policies due to their reliance on stochastic gradient descent-ascent algorithms. We propose a novel algorithm for constrained RL that does not suffer from these limitations. Leveraging recent results on regularized saddle-flow dynamics, we develop a novel stochastic gradient descent-ascent algorithm whose trajectories almost surely converge to the optimal policy. Secondly, we study the problem of incorporating safety-critical constraints to RL that allow an agent to avoid (unsafe) regions of the state space. Though such a safety goal can be captured by an action-value-like function, a.k.a. safety critics, the associated operator lacks the desired contraction and uniqueness properties that the classical Bellman operator enjoys. In this work, we overcome the non-contractiveness of safety critic operators by leveraging that safety is a binary property. To that end, we study the properties of the binary safety critic associated with a deterministic dynamical system that seeks to avoid reaching an unsafe region. We formulate the corresponding binary Bellman equation (B2E) for safety and study its properties. While the resulting operator is still non-contractive, we fully characterize its fixed points representing–except for a spurious solution–maximal persistently safe regions of the state space that can always avoid failure. We provide an algorithm that, by design, leverages axiomatic knowledge of safe data to avoid spurious fixed points.
@talk{mapi24, abstract = {Integrating Reinforcement Learning (RL) in safety-critical applications, such as autonomous vehicles, healthcare, and industrial automation, necessitates an increased focus on safety and reliability. In this talk, we consider two complementary mechanisms to augment RL's suitability for safety-critical systems. Firstly, we consider a constrained reinforcement learning (C-RL) setting, wherein agents aim to maximize rewards while adhering to required constraints on secondary specifications. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. However, such methods exhibit a discrepancy between the behavioral and optimal policies due to their reliance on stochastic gradient descent-ascent algorithms. We propose a novel algorithm for constrained RL that does not suffer from these limitations. Leveraging recent results on regularized saddle-flow dynamics, we develop a novel stochastic gradient descent-ascent algorithm whose trajectories almost surely converge to the optimal policy. Secondly, we study the problem of incorporating safety-critical constraints to RL that allow an agent to avoid (unsafe) regions of the state space. Though such a safety goal can be captured by an action-value-like function, a.k.a. safety critics, the associated operator lacks the desired contraction and uniqueness properties that the classical Bellman operator enjoys. In this work, we overcome the non-contractiveness of safety critic operators by leveraging that safety is a binary property. To that end, we study the properties of the binary safety critic associated with a deterministic dynamical system that seeks to avoid reaching an unsafe region. We formulate the corresponding binary Bellman equation (B2E) for safety and study its properties. While the resulting operator is still non-contractive, we fully characterize its fixed points representing--except for a spurious solution--maximal persistently safe regions of the state space that can always avoid failure. We provide an algorithm that, by design, leverages axiomatic knowledge of safe data to avoid spurious fixed points.}, date = {06/12/2024}, day = {12}, event = {Tercera Conferencia Colombiana de Matematicas Aplicadas e Industriales}, host = {Javier Peña (CMU), Mateo Diaz (JHU)}, month = {06}, role = {Speaker}, title = {Reinforcement Learning for Safety Critical Applications}, url = {https://mallada.ece.jhu.edu/talks/202406-MAPI.pdf}, year = {2024} }
- 2024-06-18: Data-driven Analysis of Dynamical Systems Using Recurrent Sets, INFORMS International Conference.
[BibTeX] [Abstract] [Download PDF]
In this talk, we develop model-free methods for analyzing dynamical systems using trajectory data. Our critical insight is to replace the notion of invariance, a core concept in Lyapunov Theory, with the more relaxed notion of recurrence. Specifically, a set is τ-recurrent (resp. k-recurrent) if every trajectory that starts within the set returns to it after at most τ seconds (resp. k steps). We leverage this notion of recurrence to develop several analysis tools and algorithms to study dynamical systems. Firstly, we consider the problem of learning an inner approximation of the region of attraction (ROA) of an asymptotically stable equilibrium point using trajectory data. We show that a τ-recurrent set containing a stable equilibrium must be a subset of its ROA under mild assumptions. We then develop algorithms that compute inner approximations of the ROA using counter-examples of recurrence that are obtained by sampling finite-length trajectories. Secondly, we generalize Lyapunov’s Direct Method to allow for non-monotonic evolution of the function values by only requiring sub-level sets to be τ-recurrent (instead of invariant). We provide conditions for stability, asymptotic stability, and exponential stability of an equilibrium using τ-decreasing functions (functions whose value along trajectories decrease after at most τ seconds) and develop a verification algorithm that leverages GPU parallel processing to verify such conditions using trajectories. We finalize by discussing future research directions and possible extensions for control.
@talk{informs2024, abstract = {In this talk, we develop model-free methods for analyzing dynamical systems using trajectory data. Our critical insight is to replace the notion of invariance, a core concept in Lyapunov Theory, with the more relaxed notion of recurrence. Specifically, a set is τ-recurrent (resp. k-recurrent) if every trajectory that starts within the set returns to it after at most τ seconds (resp. k steps). We leverage this notion of recurrence to develop several analysis tools and algorithms to study dynamical systems. Firstly, we consider the problem of learning an inner approximation of the region of attraction (ROA) of an asymptotically stable equilibrium point using trajectory data. We show that a τ-recurrent set containing a stable equilibrium must be a subset of its ROA under mild assumptions. We then develop algorithms that compute inner approximations of the ROA using counter-examples of recurrence that are obtained by sampling finite-length trajectories. Secondly, we generalize Lyapunov's Direct Method to allow for non-monotonic evolution of the function values by only requiring sub-level sets to be τ-recurrent (instead of invariant). We provide conditions for stability, asymptotic stability, and exponential stability of an equilibrium using τ-decreasing functions (functions whose value along trajectories decrease after at most τ seconds) and develop a verification algorithm that leverages GPU parallel processing to verify such conditions using trajectories. We finalize by discussing future research directions and possible extensions for control.}, date = {06/18/2024}, day = {18}, event = {INFORMS International Conference}, host = {Luis Zuluaga (Lehigh), Mateo Diaz (JHU)}, month = {06}, role = {Speaker}, title = {Data-driven Analysis of Dynamical Systems Using Recurrent Sets}, url = {https://mallada.ece.jhu.edu/talks/202406-Informs.pdf}, year = {2024} }
- 2024-06-05: Data-driven Analysis of Dynamical Systems Using Recurrent Sets, Department of Automatic Control, Lund University.
[BibTeX] [Abstract] [Download PDF]
In this talk, we develop model-free methods for analyzing dynamical systems using trajectory data. Our critical insight is to replace the notion of invariance, a core concept in Lyapunov Theory, with the more relaxed notion of recurrence. Specifically, a set is τ-recurrent (resp. k-recurrent) if every trajectory that starts within the set returns to it after at most τ seconds (resp. k steps). We leverage this notion of recurrence to develop several analysis tools and algorithms to study dynamical systems. Firstly, we consider the problem of learning an inner approximation of the region of attraction (ROA) of an asymptotically stable equilibrium point using trajectory data. We show that a τ-recurrent set containing a stable equilibrium must be a subset of its ROA under mild assumptions. We then develop algorithms that compute inner approximations of the ROA using counter-examples of recurrence that are obtained by sampling finite-length trajectories. Secondly, we generalize Lyapunov’s Direct Method to allow for non-monotonic evolution of the function values by only requiring sub-level sets to be τ-recurrent (instead of invariant). We provide conditions for stability, asymptotic stability, and exponential stability of an equilibrium using τ-decreasing functions (functions whose value along trajectories decrease after at most τ seconds) and develop a verification algorithm that leverages GPU parallel processing to verify such conditions using trajectories. We finalize by discussing future research directions and possible extensions for control.
@talk{lund2024, abstract = {In this talk, we develop model-free methods for analyzing dynamical systems using trajectory data. Our critical insight is to replace the notion of invariance, a core concept in Lyapunov Theory, with the more relaxed notion of recurrence. Specifically, a set is τ-recurrent (resp. k-recurrent) if every trajectory that starts within the set returns to it after at most τ seconds (resp. k steps). We leverage this notion of recurrence to develop several analysis tools and algorithms to study dynamical systems. Firstly, we consider the problem of learning an inner approximation of the region of attraction (ROA) of an asymptotically stable equilibrium point using trajectory data. We show that a τ-recurrent set containing a stable equilibrium must be a subset of its ROA under mild assumptions. We then develop algorithms that compute inner approximations of the ROA using counter-examples of recurrence that are obtained by sampling finite-length trajectories. Secondly, we generalize Lyapunov's Direct Method to allow for non-monotonic evolution of the function values by only requiring sub-level sets to be τ-recurrent (instead of invariant). We provide conditions for stability, asymptotic stability, and exponential stability of an equilibrium using τ-decreasing functions (functions whose value along trajectories decrease after at most τ seconds) and develop a verification algorithm that leverages GPU parallel processing to verify such conditions using trajectories. We finalize by discussing future research directions and possible extensions for control.}, date = {06/05/2024}, day = {05}, event = {Department of Automatic Control, Lund University}, host = {Richard Pates (Lund)}, month = {06}, role = {Lecture}, title = {Data-driven Analysis of Dynamical Systems Using Recurrent Sets}, url = {https://mallada.ece.jhu.edu/talks/202406-Lund.pdf}, year = {2024} }
- 2024-06-06: Data-driven Analysis of Dynamical Systems Using Recurrent Sets, Cyber-Physical Systems Lab, Université catholique de Louvain.
[BibTeX] [Abstract] [Download PDF]
In this talk, we develop model-free methods for analyzing dynamical systems using trajectory data. Our critical insight is to replace the notion of invariance, a core concept in Lyapunov Theory, with the more relaxed notion of recurrence. Specifically, a set is τ-recurrent (resp. k-recurrent) if every trajectory that starts within the set returns to it after at most τ seconds (resp. k steps). We leverage this notion of recurrence to develop several analysis tools and algorithms to study dynamical systems. Firstly, we consider the problem of learning an inner approximation of the region of attraction (ROA) of an asymptotically stable equilibrium point using trajectory data. We show that a τ-recurrent set containing a stable equilibrium must be a subset of its ROA under mild assumptions. We then develop algorithms that compute inner approximations of the ROA using counter-examples of recurrence that are obtained by sampling finite-length trajectories. Secondly, we generalize Lyapunov’s Direct Method to allow for non-monotonic evolution of the function values by only requiring sub-level sets to be τ-recurrent (instead of invariant). We provide conditions for stability, asymptotic stability, and exponential stability of an equilibrium using τ-decreasing functions (functions whose value along trajectories decrease after at most τ seconds) and develop a verification algorithm that leverages GPU parallel processing to verify such conditions using trajectories. We finalize by discussing future research directions and possible extensions for control.
@talk{ucl2024, abstract = {In this talk, we develop model-free methods for analyzing dynamical systems using trajectory data. Our critical insight is to replace the notion of invariance, a core concept in Lyapunov Theory, with the more relaxed notion of recurrence. Specifically, a set is τ-recurrent (resp. k-recurrent) if every trajectory that starts within the set returns to it after at most τ seconds (resp. k steps). We leverage this notion of recurrence to develop several analysis tools and algorithms to study dynamical systems. Firstly, we consider the problem of learning an inner approximation of the region of attraction (ROA) of an asymptotically stable equilibrium point using trajectory data. We show that a τ-recurrent set containing a stable equilibrium must be a subset of its ROA under mild assumptions. We then develop algorithms that compute inner approximations of the ROA using counter-examples of recurrence that are obtained by sampling finite-length trajectories. Secondly, we generalize Lyapunov's Direct Method to allow for non-monotonic evolution of the function values by only requiring sub-level sets to be τ-recurrent (instead of invariant). We provide conditions for stability, asymptotic stability, and exponential stability of an equilibrium using τ-decreasing functions (functions whose value along trajectories decrease after at most τ seconds) and develop a verification algorithm that leverages GPU parallel processing to verify such conditions using trajectories. We finalize by discussing future research directions and possible extensions for control.}, date = {06/06/2024}, day = {06}, event = {Cyber-Physical Systems Lab, Université catholique de Louvain}, host = {Raphael Jungers (UCL)}, month = {06}, role = {Lecture}, title = {Data-driven Analysis of Dynamical Systems Using Recurrent Sets}, url = {https://mallada.ece.jhu.edu/talks/202406-UCL.pdf}, year = {2024} }
- 2024-05-16: Recurrence of Nonlinear Control Systems: Entropy and Bit Rates, Hybrid Systems: Computation and Control (HSCC).
[BibTeX] [Abstract] [Download PDF]
In this paper, we introduce the notion of recurrence entropy in the context of nonlinear control systems. A set is said to be (tau-)recurrent if every trajectory that starts in the set returns to it (within at most $τ$ units of time). Recurrence entropy quantifies the complexity of making a set tau-recurrent measured by the average rate of growth, as time increases, of the number of control signals required to achieve this goal. Our analysis reveals that, compared to invariance, recurrence is quantitatively less complex, meaning that the recurrence entropy of a set is no larger than, and often strictly smaller than, the invariance entropy. Our results further offer insights into the minimum data rate required for achieving recurrence. We also present an algorithm for achieving recurrence asymptotically.
@talk{hscc2024, abstract = {In this paper, we introduce the notion of recurrence entropy in the context of nonlinear control systems. A set is said to be (tau-)recurrent if every trajectory that starts in the set returns to it (within at most $τ$ units of time). Recurrence entropy quantifies the complexity of making a set tau-recurrent measured by the average rate of growth, as time increases, of the number of control signals required to achieve this goal. Our analysis reveals that, compared to invariance, recurrence is quantitatively less complex, meaning that the recurrence entropy of a set is no larger than, and often strictly smaller than, the invariance entropy. Our results further offer insights into the minimum data rate required for achieving recurrence. We also present an algorithm for achieving recurrence asymptotically.}, date = {05/16/2024}, day = {16}, event = {Hybrid Systems: Computation and Control (HSCC)}, month = {05}, role = {Lecture}, title = {Recurrence of Nonlinear Control Systems: Entropy and Bit Rates}, url = {https://mallada.ece.jhu.edu/talks/202405-HSCC.pdf}, year = {2024} }
- 2024-03-28: Options for Mitigation Measures: Avenues for new Research, ESIG/G-PST Special Topic Workshop on Oscillations.
[BibTeX] [Download PDF]@talk{esig24, date = {03/28/2024}, day = {28}, event = {ESIG/G-PST Special Topic Workshop on Oscillations}, host = {Mark O'Malley (Imperial)}, month = {03}, role = {Lecture}, title = {Options for Mitigation Measures: Avenues for new Research}, url = {https://mallada.ece.jhu.edu/talks/202403-ESIG.pdf}, year = {2024} }
- 2024-03-20: Model-Free Analysis of Dynamical Systems Using Recurrent Sets, ECE Colloquium, Rutgers University.
[BibTeX] [Abstract] [Download PDF]
In this talk, we develop model-free methods for analyzing dynamical systems using trajectory data. Our critical insight is to replace the notion of invariance, a core concept in Lyapunov Theory, with the more relaxed notion of recurrence. Specifically, a set is τ-recurrent (resp. k-recurrent) if every trajectory that starts within the set returns to it after at most τ seconds (resp. k steps). We leverage this notion of recurrence to develop several analysis tools and algorithms to study dynamical systems. Firstly, we consider the problem of learning an inner approximation of the region of attraction (ROA) of an asymptotically stable equilibrium point using trajectory data. We show that a τ-recurrent set containing a stable equilibrium must be a subset of its ROA under mild assumptions. We then develop algorithms that compute inner approximations of the ROA using counter-examples of recurrence that are obtained by sampling finite-length trajectories. Secondly, we generalize Lyapunov’s Direct Method to allow for non-monotonic evolution of the function values by only requiring sub-level sets to be τ-recurrent (instead of invariant). We provide conditions for stability, asymptotic stability, and exponential stability of an equilibrium using τ-decreasing functions (functions whose value along trajectories decrease after at most τ seconds) and develop a verification algorithm that leverages GPU parallel processing to verify such conditions using trajectories. We finalize by discussing future research directions and possible extensions for control.
@talk{rutgers24, abstract = {In this talk, we develop model-free methods for analyzing dynamical systems using trajectory data. Our critical insight is to replace the notion of invariance, a core concept in Lyapunov Theory, with the more relaxed notion of recurrence. Specifically, a set is τ-recurrent (resp. k-recurrent) if every trajectory that starts within the set returns to it after at most τ seconds (resp. k steps). We leverage this notion of recurrence to develop several analysis tools and algorithms to study dynamical systems. Firstly, we consider the problem of learning an inner approximation of the region of attraction (ROA) of an asymptotically stable equilibrium point using trajectory data. We show that a τ-recurrent set containing a stable equilibrium must be a subset of its ROA under mild assumptions. We then develop algorithms that compute inner approximations of the ROA using counter-examples of recurrence that are obtained by sampling finite-length trajectories. Secondly, we generalize Lyapunov's Direct Method to allow for non-monotonic evolution of the function values by only requiring sub-level sets to be τ-recurrent (instead of invariant). We provide conditions for stability, asymptotic stability, and exponential stability of an equilibrium using τ-decreasing functions (functions whose value along trajectories decrease after at most τ seconds) and develop a verification algorithm that leverages GPU parallel processing to verify such conditions using trajectories. We finalize by discussing future research directions and possible extensions for control.}, date = {03/20/2024}, day = {20}, event = {ECE Colloquium, Rutgers University}, host = {Daniel Burbano (Rutgers)}, month = {03}, role = {Lecture}, title = {Model-Free Analysis of Dynamical Systems Using Recurrent Sets}, url = {https://mallada.ece.jhu.edu/talks/202403-Rutgers.pdf}, year = {2024} }
- 2024-02-16: Reinforcement Learning for Safety Critical Applications, George Mason University.
[BibTeX] [Abstract] [Download PDF]
Integrating Reinforcement Learning (RL) in safety-critical applications, such as autonomous vehicles, healthcare, and industrial automation, necessitates an increased focus on safety and reliability. In this talk, we consider two complementary mechanisms to augment RL’s suitability for safety-critical systems. Firstly, we consider a constrained reinforcement learning (C-RL) setting, wherein agents aim to maximize rewards while adhering to required constraints on secondary specifications. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. However, such methods exhibit a discrepancy between the behavioral and optimal policies due to their reliance on stochastic gradient descent-ascent algorithms. We propose a novel algorithm for constrained RL that does not suffer from these limitations. Leveraging recent results on regularized saddle-flow dynamics, we develop a novel stochastic gradient descent-ascent algorithm whose trajectories almost surely converge to the optimal policy. Secondly, we study the problem of incorporating safety-critical constraints to RL that allow an agent to avoid (unsafe) regions of the state space. Though such a safety goal can be captured by an action-value-like function, a.k.a. safety critics, the associated operator lacks the desired contraction and uniqueness properties that the classical Bellman operator enjoys. In this work, we overcome the non-contractiveness of safety critic operators by leveraging that safety is a binary property. To that end, we study the properties of the binary safety critic associated with a deterministic dynamical system that seeks to avoid reaching an unsafe region. We formulate the corresponding binary Bellman equation (B2E) for safety and study its properties. While the resulting operator is still non-contractive, we fully characterize its fixed points representing–except for a spurious solution–maximal persistently safe regions of the state space that can always avoid failure. We provide an algorithm that, by design, leverages axiomatic knowledge of safe data to avoid spurious fixed points.
@talk{gmu24, abstract = {Integrating Reinforcement Learning (RL) in safety-critical applications, such as autonomous vehicles, healthcare, and industrial automation, necessitates an increased focus on safety and reliability. In this talk, we consider two complementary mechanisms to augment RL's suitability for safety-critical systems. Firstly, we consider a constrained reinforcement learning (C-RL) setting, wherein agents aim to maximize rewards while adhering to required constraints on secondary specifications. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. However, such methods exhibit a discrepancy between the behavioral and optimal policies due to their reliance on stochastic gradient descent-ascent algorithms. We propose a novel algorithm for constrained RL that does not suffer from these limitations. Leveraging recent results on regularized saddle-flow dynamics, we develop a novel stochastic gradient descent-ascent algorithm whose trajectories almost surely converge to the optimal policy. Secondly, we study the problem of incorporating safety-critical constraints to RL that allow an agent to avoid (unsafe) regions of the state space. Though such a safety goal can be captured by an action-value-like function, a.k.a. safety critics, the associated operator lacks the desired contraction and uniqueness properties that the classical Bellman operator enjoys. In this work, we overcome the non-contractiveness of safety critic operators by leveraging that safety is a binary property. To that end, we study the properties of the binary safety critic associated with a deterministic dynamical system that seeks to avoid reaching an unsafe region. We formulate the corresponding binary Bellman equation (B2E) for safety and study its properties. While the resulting operator is still non-contractive, we fully characterize its fixed points representing--except for a spurious solution--maximal persistently safe regions of the state space that can always avoid failure. We provide an algorithm that, by design, leverages axiomatic knowledge of safe data to avoid spurious fixed points.}, date = {02/2024}, day = {16}, event = {George Mason University}, host = {Ningshi Yao (GMU)}, month = {02}, role = {Lecture}, title = {Reinforcement Learning for Safety Critical Applications}, url = {https://mallada.ece.jhu.edu/talks/202402-GMU.pdf}, year = {2024} }
- 2024-01-11: Reinforcement Learning for Safety Critical Applications, Applied Physics Laboratory, JHU.
[BibTeX] [Abstract] [Download PDF]
Integrating Reinforcement Learning (RL) in safety-critical applications, such as autonomous vehicles, healthcare, and industrial automation, necessitates an increased focus on safety and reliability. In this talk, we consider two complementary mechanisms to augment RL’s suitability for safety-critical systems. Firstly, we consider a constrained reinforcement learning (C-RL) setting, wherein agents aim to maximize rewards while adhering to required constraints on secondary specifications. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. However, such methods exhibit a discrepancy between the behavioral and optimal policies due to their reliance on stochastic gradient descent-ascent algorithms. We propose a novel algorithm for constrained RL that does not suffer from these limitations. Leveraging recent results on regularized saddle-flow dynamics, we develop a novel stochastic gradient descent-ascent algorithm whose trajectories almost surely converge to the optimal policy. Secondly, we study the problem of incorporating safety-critical constraints to RL that allow an agent to avoid (unsafe) regions of the state space. Though such a safety goal can be captured by an action-value-like function, a.k.a. safety critics, the associated operator lacks the desired contraction and uniqueness properties that the classical Bellman operator enjoys. In this work, we overcome the non-contractiveness of safety critic operators by leveraging that safety is a binary property. To that end, we study the properties of the binary safety critic associated with a deterministic dynamical system that seeks to avoid reaching an unsafe region. We formulate the corresponding binary Bellman equation (B2E) for safety and study its properties. While the resulting operator is still non-contractive, we fully characterize its fixed points representing–except for a spurious solution–maximal persistently safe regions of the state space that can always avoid failure. We provide an algorithm that, by design, leverages axiomatic knowledge of safe data to avoid spurious fixed points.
@talk{apl24, abstract = {Integrating Reinforcement Learning (RL) in safety-critical applications, such as autonomous vehicles, healthcare, and industrial automation, necessitates an increased focus on safety and reliability. In this talk, we consider two complementary mechanisms to augment RL's suitability for safety-critical systems. Firstly, we consider a constrained reinforcement learning (C-RL) setting, wherein agents aim to maximize rewards while adhering to required constraints on secondary specifications. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. However, such methods exhibit a discrepancy between the behavioral and optimal policies due to their reliance on stochastic gradient descent-ascent algorithms. We propose a novel algorithm for constrained RL that does not suffer from these limitations. Leveraging recent results on regularized saddle-flow dynamics, we develop a novel stochastic gradient descent-ascent algorithm whose trajectories almost surely converge to the optimal policy. Secondly, we study the problem of incorporating safety-critical constraints to RL that allow an agent to avoid (unsafe) regions of the state space. Though such a safety goal can be captured by an action-value-like function, a.k.a. safety critics, the associated operator lacks the desired contraction and uniqueness properties that the classical Bellman operator enjoys. In this work, we overcome the non-contractiveness of safety critic operators by leveraging that safety is a binary property. To that end, we study the properties of the binary safety critic associated with a deterministic dynamical system that seeks to avoid reaching an unsafe region. We formulate the corresponding binary Bellman equation (B2E) for safety and study its properties. While the resulting operator is still non-contractive, we fully characterize its fixed points representing--except for a spurious solution--maximal persistently safe regions of the state space that can always avoid failure. We provide an algorithm that, by design, leverages axiomatic knowledge of safe data to avoid spurious fixed points.}, date = {02/2024}, day = {11}, event = {Applied Physics Laboratory, JHU}, host = {Jared Markowitz}, month = {01}, role = {Lecture}, title = {Reinforcement Learning for Safety Critical Applications}, url = {https://mallada.ece.jhu.edu/talks/202401-JHUAPL.pdf}, year = {2024} }
2023
- 2023-12-11: Unintended Consequences of Market Designs, IHPC’s Workshop of Power and Energy Systems of the (near) Future, ASTAR.
[BibTeX] [Abstract] [Download PDF]
In this talk, we seek to highlight the importance of accounting for the incentives of *all* market participants when designing market mechanisms for electricity. To this end, we perform a Nash equilibrium analysis of two different market mechanisms that aim to illustrate the critical role that the incentives of consumers and other new types of participants, such as storage, play in the equilibrium outcome. Firstly, we study the incentives of heterogeneous participants (generators and consumers) in a two-stage settlement market, where generators participate using a supply function bid and consumers use a quantity bid. We show that strategic consumers are able to exploit generators’ strategic behavior to maintain a systematic difference between the forward and spot prices, with the latter being higher. Notably, such a strategy does bring down consumer payments and undermines the supply-side market power. We further observe situations where generators lose profit by behaving strategically, a sign of overturn of the conventional supply-side market power. Secondly, we study a market mechanism for multi-interval electricity markets with generator and storage participants. Drawing ideas from supply function bidding, we introduce a novel bid structure for storage participation that allows storage units to communicate their cost to the market using energy-cycling functions that map prices to cycle depths. The resulting market-clearing process — implemented via convex programming — yields corresponding schedules and payments based on traditional energy prices for power supply and per-cycle prices for storage utilization. Our solution shows several advantages over the standard prosumer-based approach that prices energy per slot. In particular, it does not require a priori estimation of future prices and leads to an efficient, competitive equilibrium.
@talk{astar23, abstract = {In this talk, we seek to highlight the importance of accounting for the incentives of *all* market participants when designing market mechanisms for electricity. To this end, we perform a Nash equilibrium analysis of two different market mechanisms that aim to illustrate the critical role that the incentives of consumers and other new types of participants, such as storage, play in the equilibrium outcome. Firstly, we study the incentives of heterogeneous participants (generators and consumers) in a two-stage settlement market, where generators participate using a supply function bid and consumers use a quantity bid. We show that strategic consumers are able to exploit generators' strategic behavior to maintain a systematic difference between the forward and spot prices, with the latter being higher. Notably, such a strategy does bring down consumer payments and undermines the supply-side market power. We further observe situations where generators lose profit by behaving strategically, a sign of overturn of the conventional supply-side market power. Secondly, we study a market mechanism for multi-interval electricity markets with generator and storage participants. Drawing ideas from supply function bidding, we introduce a novel bid structure for storage participation that allows storage units to communicate their cost to the market using energy-cycling functions that map prices to cycle depths. The resulting market-clearing process -- implemented via convex programming -- yields corresponding schedules and payments based on traditional energy prices for power supply and per-cycle prices for storage utilization. Our solution shows several advantages over the standard prosumer-based approach that prices energy per slot. In particular, it does not require a priori estimation of future prices and leads to an efficient, competitive equilibrium.}, date = {12/11/2023}, day = {11}, event = {IHPC's Workshop of Power and Energy Systems of the (near) Future, ASTAR}, host = {John Pang (ASTAR)}, month = {12}, role = {Speaker}, title = {Unintended Consequences of Market Designs}, url = {https://mallada.ece.jhu.edu/talks/202312-ASTAR.pdf}, year = {2023} }
- 2023-11-04: Model-Free Analysis of Dynamical Systems Using Recurrent Sets, FIND Seminar, Cornell University.
[BibTeX] [Abstract] [Download PDF]
In this talk, we develop model-free methods for analyzing dynamical systems using trajectory data. Our critical insight is to replace the notion of invariance, a core concept in Lyapunov Theory, with the more relaxed notion of recurrence. Specifically, a set is τ-recurrent (resp. k-recurrent) if every trajectory that starts within the set returns to it after at most τ seconds (resp. k steps). We leverage this notion of recurrence to develop several analysis tools and algorithms to study dynamical systems. Firstly, we consider the problem of learning an inner approximation of the region of attraction (ROA) of an asymptotically stable equilibrium point using trajectory data. We show that a τ-recurrent set containing a stable equilibrium must be a subset of its ROA under mild assumptions. We then develop algorithms that compute inner approximations of the ROA using counter-examples of recurrence that are obtained by sampling finite-length trajectories. Secondly, we generalize Lyapunov’s Direct Method to allow for non-monotonic evolution of the function values by only requiring sub-level sets to be τ-recurrent (instead of invariant). We provide conditions for stability, asymptotic stability, and exponential stability of an equilibrium using τ-decreasing functions (functions whose value along trajectories decrease after at most τ seconds) and develop a verification algorithm that leverages GPU parallel processing to verify such conditions using trajectories. We finalize discussing future research directions and possible extensions for control.
@talk{cornell23, abstract = {In this talk, we develop model-free methods for analyzing dynamical systems using trajectory data. Our critical insight is to replace the notion of invariance, a core concept in Lyapunov Theory, with the more relaxed notion of recurrence. Specifically, a set is τ-recurrent (resp. k-recurrent) if every trajectory that starts within the set returns to it after at most τ seconds (resp. k steps). We leverage this notion of recurrence to develop several analysis tools and algorithms to study dynamical systems. Firstly, we consider the problem of learning an inner approximation of the region of attraction (ROA) of an asymptotically stable equilibrium point using trajectory data. We show that a τ-recurrent set containing a stable equilibrium must be a subset of its ROA under mild assumptions. We then develop algorithms that compute inner approximations of the ROA using counter-examples of recurrence that are obtained by sampling finite-length trajectories. Secondly, we generalize Lyapunov's Direct Method to allow for non-monotonic evolution of the function values by only requiring sub-level sets to be τ-recurrent (instead of invariant). We provide conditions for stability, asymptotic stability, and exponential stability of an equilibrium using τ-decreasing functions (functions whose value along trajectories decrease after at most τ seconds) and develop a verification algorithm that leverages GPU parallel processing to verify such conditions using trajectories. We finalize discussing future research directions and possible extensions for control.}, date = {11/04/2023}, day = {04}, event = {FIND Seminar, Cornell University}, host = {Kevin A. Tang (Cornell)}, month = {11}, role = {Lecture}, title = {Model-Free Analysis of Dynamical Systems Using Recurrent Sets}, url = {https://mallada.ece.jhu.edu/talks/202311-Cornell.pdf}, year = {2023} }
- 2023-10-12: Reinforcement Learning with Almost Sure Constraints, MURI Workshop.
[BibTeX] [Abstract] [Download PDF]
In this work, we study how to tackle decision-making for safety-critical systems under uncertainty. To that end, we formulate a Reinforcement Learning problem with Almost Sure constraints, in which one seeks a policy that allows no more than $Δınℕ$ unsafe events in any trajectory, with probability one. We argue that this type of constraint might be better suited for safety-critical systems as opposed to the usual average constraint employed in Constrained Markov Decision Processes and that, moreover, having constraints of this kind makes feasible policies much easier to find. The talk is didactically split into two parts, first considering $Δ=0$ and then the $Δ≥ 0$ case. At the core of our theory is a barrier-based decomposition of the Q-function that decouples the problems of optimality and feasibility and allows them to be learned either independently or in conjunction. We develop an algorithm for characterizing the set of all feasible policies that provably converges in expected finite time. We further develop sample-complexity bounds for learning this set with high probability. Simulations corroborate our theoretical findings and showcase how our algorithm can be wrapped around other learning algorithms to hasten the search for first feasible and then optimal policies.
@talk{muri23, abstract = {In this work, we study how to tackle decision-making for safety-critical systems under uncertainty. To that end, we formulate a Reinforcement Learning problem with Almost Sure constraints, in which one seeks a policy that allows no more than $Δınℕ$ unsafe events in any trajectory, with probability one. We argue that this type of constraint might be better suited for safety-critical systems as opposed to the usual average constraint employed in Constrained Markov Decision Processes and that, moreover, having constraints of this kind makes feasible policies much easier to find. The talk is didactically split into two parts, first considering $Δ=0$ and then the $Δ≥ 0$ case. At the core of our theory is a barrier-based decomposition of the Q-function that decouples the problems of optimality and feasibility and allows them to be learned either independently or in conjunction. We develop an algorithm for characterizing the set of all feasible policies that provably converges in expected finite time. We further develop sample-complexity bounds for learning this set with high probability. Simulations corroborate our theoretical findings and showcase how our algorithm can be wrapped around other learning algorithms to hasten the search for first feasible and then optimal policies.}, date = {10/2023}, day = {12}, event = {MURI Workshop}, host = {Mario Sznaier (Northeastern)}, month = {10}, role = {Speaker}, title = {Reinforcement Learning with Almost Sure Constraints}, url = {https://mallada.ece.jhu.edu/talks/202310-MURI.pdf}, year = {2023} }
- 2023-09-07: Grid Shaping Control for High-IBR Power Systems: Stability Analysis and Control Design, GE EDGE Symposium.
[BibTeX] [Abstract] [Download PDF]
The transition of power systems from conventional synchronous generation towards renewable energy sources -with little or no inertia- is gradually threatening classical methods for achieving grid synchronization. A widely embraced approach to mitigate this problem is to mimic inertial response using grid-connected inverters. That is, to introduce virtual inertia to restore the stiffness that the system used to enjoy. In this talk, we seek to challenge this approach. We advocate taking advantage of the system’s low inertia to restore grid synchronism without incurring excessive control efforts. To this end, we develop an analysis and design framework for inverter-based frequency control. First, we develop novel stability analysis tools for power systems, which allow for the decentralized design of inverter-based controllers. The method requires that each inverter satisfies a standard H-infinity design requirement that depends on the dynamics of the components and inverters at each bus and the aggregate susceptance of the transmission lines connected to it. It is robust to network and delay uncertainty and, when no network information is available, reduces to the standard passivity condition for stability. Then, we propose a novel grid-forming control strategy, so-called grid shaping control, that aims to shape the frequency response of synchronous generators (SGs) to load perturbations so as to efficiently arrest sudden frequency drops. The approach builds on novel analysis tools that can characterize the Center of Inertia (CoI) response of a system with both IBRs and SGs and use this characterization to reshape it.
@talk{ge-edge23, abstract = {The transition of power systems from conventional synchronous generation towards renewable energy sources -with little or no inertia- is gradually threatening classical methods for achieving grid synchronization. A widely embraced approach to mitigate this problem is to mimic inertial response using grid-connected inverters. That is, to introduce virtual inertia to restore the stiffness that the system used to enjoy. In this talk, we seek to challenge this approach. We advocate taking advantage of the system's low inertia to restore grid synchronism without incurring excessive control efforts. To this end, we develop an analysis and design framework for inverter-based frequency control. First, we develop novel stability analysis tools for power systems, which allow for the decentralized design of inverter-based controllers. The method requires that each inverter satisfies a standard H-infinity design requirement that depends on the dynamics of the components and inverters at each bus and the aggregate susceptance of the transmission lines connected to it. It is robust to network and delay uncertainty and, when no network information is available, reduces to the standard passivity condition for stability. Then, we propose a novel grid-forming control strategy, so-called grid shaping control, that aims to shape the frequency response of synchronous generators (SGs) to load perturbations so as to efficiently arrest sudden frequency drops. The approach builds on novel analysis tools that can characterize the Center of Inertia (CoI) response of a system with both IBRs and SGs and use this characterization to reshape it.}, date = {09/20/2023}, day = {07}, event = {GE EDGE Symposium}, host = {Aditya Kumar (GE)}, month = {09}, role = {Speaker}, title = {Grid Shaping Control for High-IBR Power Systems: Stability Analysis and Control Design}, url = {https://mallada.ece.jhu.edu/talks/202309-GE-EDGE.pdf}, year = {2023} }
- 2023-09-07: Learning Coherent Clusters in Weakly Connected Power Networks, 6th Workshop on Autonomous Energy Systems.
[BibTeX] [Abstract] [Download PDF]
Network coherence generally refers to the emergence of a simple aggregated dynamic response of generator units, despite heterogeneity in the unit’s location and dynamic constitution. In this talk, we develop a general frequency domain framework to analyze and quantify the level of network coherence that a system exhibits by relating coherence with a low-rank property of the system’s input-output response. Our analysis unveils the frequency-dependent nature of coherence and a non-trivial interplay between dynamics, network topology, and the type of disturbance. We further leverage this framework to build structure-preserving model-reduction methodology for large-scale dynamic networks with tightly-connected components and provide time-domain bounds on the approximation error of our model. Our work provides new avenues for analysis and control designs of IBR-rich power systems.
@talk{nrel23, abstract = {Network coherence generally refers to the emergence of a simple aggregated dynamic response of generator units, despite heterogeneity in the unit's location and dynamic constitution. In this talk, we develop a general frequency domain framework to analyze and quantify the level of network coherence that a system exhibits by relating coherence with a low-rank property of the system's input-output response. Our analysis unveils the frequency-dependent nature of coherence and a non-trivial interplay between dynamics, network topology, and the type of disturbance. We further leverage this framework to build structure-preserving model-reduction methodology for large-scale dynamic networks with tightly-connected components and provide time-domain bounds on the approximation error of our model. Our work provides new avenues for analysis and control designs of IBR-rich power systems. }, date = {09/07/2023}, day = {07}, event = {6th Workshop on Autonomous Energy Systems}, host = {Andrey Berstein (NREL), Guido Carvaro (NREL)}, month = {09}, role = {Speaker}, title = {Learning Coherent Clusters in Weakly Connected Power Networks}, url = {https://mallada.ece.jhu.edu/talks/202309-NREL.pdf}, year = {2023} }
- 2023-07-06: Model-Free Analysis of Dynamical Systems Using Recurrent Sets, Workshop on Uncertain Dynamical Systems.
[BibTeX] [Abstract] [Download PDF]
In this talk, we develop model-free methods for analyzing dynamical systems using data. Our key insight is to replace the notion of invariance, a core concept in Lyapunov Theory, with the more relaxed notion of recurrence. A set is τ-recurrent (resp. k-recurrent) if every trajectory that starts within the set returns to it after at most τ seconds (resp. k steps). We then leverage the notion of recurrence to develop several analysis tools and algorithms to study dynamical systems. We first consider the problem of learning an inner approximation of the region of attraction (ROA) of an asymptotically stable equilibrium point without an explicit model of the dynamics. We show that a τ-recurrent set containing a stable equilibrium must be a subset of its ROA under mild assumptions. We then leverage this property to develop algorithms that compute inner approximations of the ROA using counter-examples of recurrence that are obtained by sampling finite-length trajectories. Our algorithms process samples sequentially, which allows them to continue being executed even after an initial offline training stage. We will finalize by providing some recent extensions of this work that generalizes Lyapunov’s Direct Method to allow for non-decreasing functions to certify stability and illustrate future research directions.
@talk{wuds23, abstract = {In this talk, we develop model-free methods for analyzing dynamical systems using data. Our key insight is to replace the notion of invariance, a core concept in Lyapunov Theory, with the more relaxed notion of recurrence. A set is τ-recurrent (resp. k-recurrent) if every trajectory that starts within the set returns to it after at most τ seconds (resp. k steps). We then leverage the notion of recurrence to develop several analysis tools and algorithms to study dynamical systems. We first consider the problem of learning an inner approximation of the region of attraction (ROA) of an asymptotically stable equilibrium point without an explicit model of the dynamics. We show that a τ-recurrent set containing a stable equilibrium must be a subset of its ROA under mild assumptions. We then leverage this property to develop algorithms that compute inner approximations of the ROA using counter-examples of recurrence that are obtained by sampling finite-length trajectories. Our algorithms process samples sequentially, which allows them to continue being executed even after an initial offline training stage. We will finalize by providing some recent extensions of this work that generalizes Lyapunov's Direct Method to allow for non-decreasing functions to certify stability and illustrate future research directions.}, date = {07/06/2023}, day = {06}, event = {Workshop on Uncertain Dynamical Systems}, host = {Mario Sznaier (Northeastern), Fabrizio Dabbene (PoliTo), Constantino Lagoa (Penn State)}, month = {07}, role = {Speaker}, title = {Model-Free Analysis of Dynamical Systems Using Recurrent Sets}, url = {https://mallada.ece.jhu.edu/talks/202307-WUDS.pdf}, year = {2023} }
- 2023-07-19: Grid Shaping Control for High-IBR Power Systems, Panel on Future electricity systems: How to handle millions of power electronic-based devices and other emerging technologies, IEEE PES General Meeting.
[BibTeX] [Abstract] [Download PDF]
The transition of power systems from conventional synchronous generation towards renewable energy sources -with little or no inertia- is gradually threatening classical methods for achieving grid synchronization. A widely embraced approach to mitigate this problem is to mimic inertial response using grid-connected inverters. That is, to introduce virtual inertia to restore the stiffness that the system used to enjoy. In this talk, we seek to challenge this approach. We advocate taking advantage of the system’s low inertia to restore grid synchronism without incurring excessive control efforts. To this end, we develop an analysis and design framework for inverter-based frequency control. First, we develop novel stability analysis tools for power systems, which allow for the decentralized design of inverter-based controllers. The method requires that each inverter satisfies a standard H-infinity design requirement that depends on the dynamics of the components and inverters at each bus and the aggregate susceptance of the transmission lines connected to it. It is robust to network and delay uncertainty and, when no network information is available, reduces to the standard passivity condition for stability. Then, we propose a novel grid-forming control strategy, so-called grid shaping control, that aims to shape the frequency response of synchronous generators (SGs) to load perturbations so as to efficiently arrest sudden frequency drops. The approach builds on novel analysis tools that can characterize the Center of Inertia (CoI) response of a system with both IBRs and SGs and use this characterization to reshape it.
@talk{pesgm23, abstract = {The transition of power systems from conventional synchronous generation towards renewable energy sources -with little or no inertia- is gradually threatening classical methods for achieving grid synchronization. A widely embraced approach to mitigate this problem is to mimic inertial response using grid-connected inverters. That is, to introduce virtual inertia to restore the stiffness that the system used to enjoy. In this talk, we seek to challenge this approach. We advocate taking advantage of the system's low inertia to restore grid synchronism without incurring excessive control efforts. To this end, we develop an analysis and design framework for inverter-based frequency control. First, we develop novel stability analysis tools for power systems, which allow for the decentralized design of inverter-based controllers. The method requires that each inverter satisfies a standard H-infinity design requirement that depends on the dynamics of the components and inverters at each bus and the aggregate susceptance of the transmission lines connected to it. It is robust to network and delay uncertainty and, when no network information is available, reduces to the standard passivity condition for stability. Then, we propose a novel grid-forming control strategy, so-called grid shaping control, that aims to shape the frequency response of synchronous generators (SGs) to load perturbations so as to efficiently arrest sudden frequency drops. The approach builds on novel analysis tools that can characterize the Center of Inertia (CoI) response of a system with both IBRs and SGs and use this characterization to reshape it.}, date = {07/19/2023}, day = {19}, event = {Panel on Future electricity systems: How to handle millions of power electronic-based devices and other emerging technologies, IEEE PES General Meeting}, host = {Claudia Andrea Rahmann (UChile), Amarsagar Reddy Ramapuram Matavalam (ASU)}, month = {07}, role = {Panelist}, title = {Grid Shaping Control for High-IBR Power Systems}, url = {https://mallada.ece.jhu.edu/talks/202307-PESGM.pdf}, year = {2023} }
- 2023-05-30: Iterative Policy Learning for Constrained RL via Dissipative Gradient Descent-Ascent, Workshop on Online optimization Methods for Data-Driven Feedback Control, American Control Conferenece.
[BibTeX] [Abstract] [Download PDF]
In constrained reinforcement learning (C-RL), an agent seeks to learn from the environment a policy that maximizes the expected cumulative reward while satisfying minimum requirements in secondary cumulative reward constraints. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. However, such methods are based on stochastic gradient descent-ascent algorithms whose trajectories are connected to the optimal policy only after a mixing output stage that depends on the algorithm’s history. As a result, there is a mismatch between the behavioral policy and the optimal one. In this talk, we propose a novel algorithm for constrained RL that does not suffer from these limitations. Leveraging recent results on regularized saddle-flow dynamics, we develop a novel stochastic gradient descent-ascent algorithm whose trajectories converge to the optimal policy almost surely.
@talk{acc23, abstract = {In constrained reinforcement learning (C-RL), an agent seeks to learn from the environment a policy that maximizes the expected cumulative reward while satisfying minimum requirements in secondary cumulative reward constraints. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. However, such methods are based on stochastic gradient descent-ascent algorithms whose trajectories are connected to the optimal policy only after a mixing output stage that depends on the algorithm's history. As a result, there is a mismatch between the behavioral policy and the optimal one. In this talk, we propose a novel algorithm for constrained RL that does not suffer from these limitations. Leveraging recent results on regularized saddle-flow dynamics, we develop a novel stochastic gradient descent-ascent algorithm whose trajectories converge to the optimal policy almost surely.}, date = {05/30/2023}, day = {30}, event = {Workshop on Online optimization Methods for Data-Driven Feedback Control, American Control Conferenece}, host = {Gianluca Bianchin (UCLouvain), Emiliano Dall'Anese (UC Boulder), Jorge Cortés (UCSD), Miguel Vaquero (IE University)}, month = {05}, role = {Speaker}, title = {Iterative Policy Learning for Constrained RL via Dissipative Gradient Descent-Ascent}, url = {https://mallada.ece.jhu.edu/talks/202305-ACC-Workshop.pdf}, year = {2023} }
- 2023-01-05: Learning Dynamics and Implicit Bias of Gradient Flow in Overparametrerized Linear Models, Joint Mathematics Meeting, Special Session.
[BibTeX] [Abstract] [Download PDF]
Contrary to the common belief that overparameterization may hurt generalization and optimization, recent work suggests that overparameterization may bias the optimization algorithm towards solutions that generalize well — a phenomenon known as implicit regularization or implicit bias — and may also accelerate convergence — a phenomenon known as implicit acceleration. This talk will provide a detailed analysis of the dynamics of gradient flow in overparameterized linear models showing that convergence to equilibrium depends on the imbalance between input and output weights (which is fixed at initialization) and the margin of the initial solution. The talk will also provide an analysis of the implicit bias, showing that large hidden layer width, together with (properly scaled) random initialization, constrains the network parameters to converge to a solution which is close to the min-norm solution.
@talk{jmm23, abstract = {Contrary to the common belief that overparameterization may hurt generalization and optimization, recent work suggests that overparameterization may bias the optimization algorithm towards solutions that generalize well --- a phenomenon known as implicit regularization or implicit bias --- and may also accelerate convergence --- a phenomenon known as implicit acceleration. This talk will provide a detailed analysis of the dynamics of gradient flow in overparameterized linear models showing that convergence to equilibrium depends on the imbalance between input and output weights (which is fixed at initialization) and the margin of the initial solution. The talk will also provide an analysis of the implicit bias, showing that large hidden layer width, together with (properly scaled) random initialization, constrains the network parameters to converge to a solution which is close to the min-norm solution.}, date = {01/05/2023}, day = {05}, event = {Joint Mathematics Meeting, Special Session}, host = {Josué Tonelli Cueto, Hitesh Gakhar, Harlin Lee}, month = {01}, role = {Speaker}, title = {Learning Dynamics and Implicit Bias of Gradient Flow in Overparametrerized Linear Models}, url = {https://mallada.ece.jhu.edu/talks/202301-JMM.pdf}, year = {2023} }
- 2023-01-18: Frequency Shaping Control for Low Inertia Power Systems,, 2023 ROSEI Summit, Johns Hopkins University.
[BibTeX] [Abstract] [Download PDF]
The transition of power systems from conventional synchronous generation towards renewable energy sources -with little or no inertia- is gradually threatening classical methods for achieving grid synchronization. A widely embraced approach to mitigate this problem is to mimic inertial response using grid-connected inverters. That is, to introduce virtual inertia to restore the stiffness that the system used to enjoy. In this talk, we seek to challenge this approach. We advocate taking advantage of the system’s low inertia to restore grid synchronism without incurring excessive control efforts. To this end, we develop an analysis and design framework for inverter-based frequency control. We define system-level performance metrics that are of practical relevance for power systems and systematically evaluate the performance of standard control strategies, such as virtual inertia and droop control, in the presence of power disturbances. Our analysis unveils the relatively limited role of inertia in improving performance and the inability of droop control to enhance performance without incurring considerable steady-state control effort. To overcome these limitations, we propose a novel frequency shaping control for grid-connected inverters -exploiting classical lead/lag compensation and model matching techniques from control theory- that can significantly outperform existing solutions while using comparable control effort.
@talk{rosei23, abstract = {The transition of power systems from conventional synchronous generation towards renewable energy sources -with little or no inertia- is gradually threatening classical methods for achieving grid synchronization. A widely embraced approach to mitigate this problem is to mimic inertial response using grid-connected inverters. That is, to introduce virtual inertia to restore the stiffness that the system used to enjoy. In this talk, we seek to challenge this approach. We advocate taking advantage of the system's low inertia to restore grid synchronism without incurring excessive control efforts. To this end, we develop an analysis and design framework for inverter-based frequency control. We define system-level performance metrics that are of practical relevance for power systems and systematically evaluate the performance of standard control strategies, such as virtual inertia and droop control, in the presence of power disturbances. Our analysis unveils the relatively limited role of inertia in improving performance and the inability of droop control to enhance performance without incurring considerable steady-state control effort. To overcome these limitations, we propose a novel frequency shaping control for grid-connected inverters -exploiting classical lead/lag compensation and model matching techniques from control theory- that can significantly outperform existing solutions while using comparable control effort.}, date = {01/18/2023}, day = {18}, event = {2023 ROSEI Summit, Johns Hopkins University}, host = {Ben Schaffer, Ben Link}, month = {01}, role = {Speaker}, title = {Frequency Shaping Control for Low Inertia Power Systems,}, url = {https://mallada.ece.jhu.edu/talks/202301-ROSEI.pdf}, year = {2023} }
2022
- 2022-12-19: Reinforcement Learning with Almost Sure Constraints, Topologı́a y Probabilidad en análisis de datos, Universidad de la Republica.
[BibTeX] [Abstract] [Download PDF]
In this work, we study how to tackle decision-making for safety-critical systems under uncertainty. To that end, we formulate a Reinforcement Learning problem with Almost Sure constraints, in which one seeks a policy that allows no more than $Δınℕ$ unsafe events in any trajectory, with probability one. We argue that this type of constraint might be better suited for safety-critical systems as opposed to the usual average constraint employed in Constrained Markov Decision Processes and that, moreover, having constraints of this kind makes feasible policies much easier to find. The talk is didactically split into two parts, first considering $Δ=0$ and then the $Δ≥ 0$ case. At the core of our theory is a barrier-based decomposition of the Q-function that decouples the problems of optimality and feasibility and allows them to be learned either independently or in conjunction. We develop an algorithm for characterizing the set of all feasible policies that provably converges in expected finite time. We further develop sample-complexity bounds for learning this set with high probability. Simulations corroborate our theoretical findings and showcase how our algorithm can be wrapped around other learning algorithms to hasten the search for first feasible and then optimal policies.
@talk{udelar22, abstract = {In this work, we study how to tackle decision-making for safety-critical systems under uncertainty. To that end, we formulate a Reinforcement Learning problem with Almost Sure constraints, in which one seeks a policy that allows no more than $Δınℕ$ unsafe events in any trajectory, with probability one. We argue that this type of constraint might be better suited for safety-critical systems as opposed to the usual average constraint employed in Constrained Markov Decision Processes and that, moreover, having constraints of this kind makes feasible policies much easier to find. The talk is didactically split into two parts, first considering $Δ=0$ and then the $Δ≥ 0$ case. At the core of our theory is a barrier-based decomposition of the Q-function that decouples the problems of optimality and feasibility and allows them to be learned either independently or in conjunction. We develop an algorithm for characterizing the set of all feasible policies that provably converges in expected finite time. We further develop sample-complexity bounds for learning this set with high probability. Simulations corroborate our theoretical findings and showcase how our algorithm can be wrapped around other learning algorithms to hasten the search for first feasible and then optimal policies.}, date = {12/19/2022}, day = {19}, event = {Topologı́a y Probabilidad en análisis de datos, Universidad de la Republica}, host = {Nicolas Frevenza (UdelaR), Soledad Villar (JHU)}, month = {12}, role = {Speaker}, title = {Reinforcement Learning with Almost Sure Constraints}, url = {https://mallada.ece.jhu.edu/talks/202212-UdelaR.pdf}, year = {2022} }
- 2022-11-02: Model-free Analysis of Dynamical Systems Using Recurrence, Data Science Seminar, Johns Hopkins University.
[BibTeX] [Download PDF]@talk{dss22, annote = {In this talk, we develop model-free methods for analyzing dynamical systems using data. Our key insight is to replace the notion of invariance, a core concept in Lyapunov Theory, with the more relaxed notion of recurrence. A set is τ-recurrent (resp. k-recurrent) if every trajectory that starts within the set returns to it after at most τ seconds (resp. k steps). We then leverage the notion of recurrence to develop several analysis tools and algorithms to study dynamical systems. We first consider the problem of learning an inner approximation of the region of attraction (ROA) of an asymptotically stable equilibrium point without an explicit model of the dynamics. We show that a τ-recurrent set containing a stable equilibrium must be a subset of its ROA under mild assumptions. We then leverage this property to develop algorithms that compute inner approximations of the ROA using counter-examples of recurrence that are obtained by sampling finite-length trajectories. Our algorithms process samples sequentially, which allows them to continue being executed even after an initial offline training stage. We will finalize by providing some recent extensions of this work that generalizes Lyapunov's Direct Method to allow for non-decreasing functions to certify stability and illustrate future research directions.}, date = {11/02/2022}, day = {02}, event = {Data Science Seminar, Johns Hopkins University}, host = {Fei Lu (JHU), Mauro Maggioni (JHU)}, month = {11}, role = {Lecture}, title = {Model-free Analysis of Dynamical Systems Using Recurrence}, url = {https://mallada.ece.jhu.edu/talks/202211-DSS-JHU.pdf}, year = {2022} }
- 2022-09-07: Unintended Consequences of Market Designs, Workshon on Human Dimension of Energy Systems, NREL.
[BibTeX] [Abstract] [Download PDF]
In this talk, we seek to highlight the importance of accounting for the incentives of *all* market participants when designing market mechanisms for electricity. To this end, we perform a Nash equilibrium analysis of two different market mechanisms that aim to illustrate the critical role that the incentives of consumers and other new types of participants, such as storage, play in the equilibrium outcome. Firstly, we study the incentives of heterogeneous participants (generators and consumers) in a two-stage settlement market, where generators participate using a supply function bid and consumers use a quantity bid. We show that strategic consumers are able to exploit generators’ strategic behavior to maintain a systematic difference between the forward and spot prices, with the latter being higher. Notably, such a strategy does bring down consumer payments and undermines the supply-side market power. We further observe situations where generators lose profit by behaving strategically, a sign of overturn of the conventional supply-side market power. Secondly, we study a market mechanism for multi-interval electricity markets with generator and storage participants. Drawing ideas from supply function bidding, we introduce a novel bid structure for storage participation that allows storage units to communicate their cost to the market using energy-cycling functions that map prices to cycle depths. The resulting market-clearing process — implemented via convex programming — yields corresponding schedules and payments based on traditional energy prices for power supply and per-cycle prices for storage utilization. Our solution shows several advantages over the standard prosumer-based approach that prices energy per slot. In particular, it does not require a priori estimation of future prices and leads to an efficient, competitive equilibrium.
@talk{nrel-hd22, abstract = {In this talk, we seek to highlight the importance of accounting for the incentives of *all* market participants when designing market mechanisms for electricity. To this end, we perform a Nash equilibrium analysis of two different market mechanisms that aim to illustrate the critical role that the incentives of consumers and other new types of participants, such as storage, play in the equilibrium outcome. Firstly, we study the incentives of heterogeneous participants (generators and consumers) in a two-stage settlement market, where generators participate using a supply function bid and consumers use a quantity bid. We show that strategic consumers are able to exploit generators' strategic behavior to maintain a systematic difference between the forward and spot prices, with the latter being higher. Notably, such a strategy does bring down consumer payments and undermines the supply-side market power. We further observe situations where generators lose profit by behaving strategically, a sign of overturn of the conventional supply-side market power. Secondly, we study a market mechanism for multi-interval electricity markets with generator and storage participants. Drawing ideas from supply function bidding, we introduce a novel bid structure for storage participation that allows storage units to communicate their cost to the market using energy-cycling functions that map prices to cycle depths. The resulting market-clearing process -- implemented via convex programming -- yields corresponding schedules and payments based on traditional energy prices for power supply and per-cycle prices for storage utilization. Our solution shows several advantages over the standard prosumer-based approach that prices energy per slot. In particular, it does not require a priori estimation of future prices and leads to an efficient, competitive equilibrium.}, date = {09/07/2022}, day = {07}, event = {Workshon on Human Dimension of Energy Systems, NREL}, host = {Andrey Berstein (NREL)}, month = {09}, role = {Speaker}, title = {Unintended Consequences of Market Designs}, url = {https://mallada.ece.jhu.edu/talks/202209-NREL-HD.pdf}, year = {2022} }
- 2022-08-25: Reinforcement Learning with Almost Sure Constraints, Massachusetts Institute of Techonology.
[BibTeX] [Abstract] [Download PDF]
In this work, we study how to tackle decision-making for safety-critical systems under uncertainty. To that end, we formulate a Reinforcement Learning problem with Almost Sure constraints, in which one seeks a policy that allows no more than $Δınℕ$ unsafe events in any trajectory, with probability one. We argue that this type of constraint might be better suited for safety-critical systems as opposed to the usual average constraint employed in Constrained Markov Decision Processes and that, moreover, having constraints of this kind makes feasible policies much easier to find. The talk is didactically split into two parts, first considering $Δ=0$ and then the $Δ≥ 0$ case. At the core of our theory is a barrier-based decomposition of the Q-function that decouples the problems of optimality and feasibility and allows them to be learned either independently or in conjunction. We develop an algorithm for characterizing the set of all feasible policies that provably converges in expected finite time. We further develop sample-complexity bounds for learning this set with high probability. Simulations corroborate our theoretical findings and showcase how our algorithm can be wrapped around other learning algorithms to hasten the search for first feasible and then optimal policies.
@talk{mit-rl22, abstract = {In this work, we study how to tackle decision-making for safety-critical systems under uncertainty. To that end, we formulate a Reinforcement Learning problem with Almost Sure constraints, in which one seeks a policy that allows no more than $Δınℕ$ unsafe events in any trajectory, with probability one. We argue that this type of constraint might be better suited for safety-critical systems as opposed to the usual average constraint employed in Constrained Markov Decision Processes and that, moreover, having constraints of this kind makes feasible policies much easier to find. The talk is didactically split into two parts, first considering $Δ=0$ and then the $Δ≥ 0$ case. At the core of our theory is a barrier-based decomposition of the Q-function that decouples the problems of optimality and feasibility and allows them to be learned either independently or in conjunction. We develop an algorithm for characterizing the set of all feasible policies that provably converges in expected finite time. We further develop sample-complexity bounds for learning this set with high probability. Simulations corroborate our theoretical findings and showcase how our algorithm can be wrapped around other learning algorithms to hasten the search for first feasible and then optimal policies.}, date = {08/24/2022}, day = {25}, event = {Massachusetts Institute of Techonology}, host = {Ali Jadbabaie (MIT)}, month = {08}, role = {Lecture}, title = {Reinforcement Learning with Almost Sure Constraints}, url = {https://mallada.ece.jhu.edu/talks/202208-MIT-DL.pdf}, year = {2022} }
- 2022-08-26: On the Convergence of Gradient Flow on Multi-layer Linear Models, Massachusetts Institute of Techonology.
[BibTeX] [Abstract] [Download PDF]
The mysterious ability of gradient-based optimization algorithms to solve the non-convex neural network training problem is one of the many unexplained puzzles behind the success of deep learning in various applications. A promising direction to explain this phenomenon is to study how initialization and overparametrization affect the convergence of training algorithms. In this talk, we analyze the convergence of gradient flow on a multi-layer linear model with a loss function of the form $f(W_1W_2·s W_L)$. We show that when $f$ satisfies the gradient dominance property, proper weight initialization leads to exponential convergence of the gradient flow to a global minimum of the loss. Moreover, the convergence rate depends on two trajectory-specific quantities that are controlled by the weight initialization: the \emphimbalance matrices, which measure the difference between the weights of adjacent layers, and the least singular value of the \emphweight product $W=W_1W_2·s W_L$. Our analysis provides improved rate bounds for several multi-layer network models studied in the literature, leading to novel characterizations of the effect of weight imbalance on the rate of convergence. Our results apply to most regression losses and extend to classification ones.
@talk{mit-dl22, abstract = {The mysterious ability of gradient-based optimization algorithms to solve the non-convex neural network training problem is one of the many unexplained puzzles behind the success of deep learning in various applications. A promising direction to explain this phenomenon is to study how initialization and overparametrization affect the convergence of training algorithms. In this talk, we analyze the convergence of gradient flow on a multi-layer linear model with a loss function of the form $f(W_1W_2·s W_L)$. We show that when $f$ satisfies the gradient dominance property, proper weight initialization leads to exponential convergence of the gradient flow to a global minimum of the loss. Moreover, the convergence rate depends on two trajectory-specific quantities that are controlled by the weight initialization: the \emphimbalance matrices, which measure the difference between the weights of adjacent layers, and the least singular value of the \emphweight product $W=W_1W_2·s W_L$. Our analysis provides improved rate bounds for several multi-layer network models studied in the literature, leading to novel characterizations of the effect of weight imbalance on the rate of convergence. Our results apply to most regression losses and extend to classification ones.}, date = {08/26/2022}, day = {26}, event = {Massachusetts Institute of Techonology}, host = {Navid Azizan (MIT)}, month = {08}, role = {Lecture}, title = {On the Convergence of Gradient Flow on Multi-layer Linear Models}, url = {https://mallada.ece.jhu.edu/talks/202208-MIT-DL.pdf}, year = {2022} }
- 2022-07-14: Learning-based Analysis and Control of Safte-Critical Systems, Workshop on Autonomous Energy Systems, National Renewable Energy Laboratory.
[BibTeX] [Download PDF]@talk{nrel-aes22, date = {07/14/2022}, day = {14}, event = {Workshop on Autonomous Energy Systems, National Renewable Energy Laboratory}, host = {Andrey Berstein (NREL), Ahmed Zamzam (NREL), Bai Cui (NREL)}, month = {07}, role = {Speaker}, title = {Learning-based Analysis and Control of Safte-Critical Systems}, url = {https://mallada.ece.jhu.edu/talks/202207-NREL.pdf}, year = {2022} }
- 2022-05-26: Learning-based Analysis and Control of Safte-Critical Systems, University of California San Diego.
[BibTeX] [Download PDF]@talk{ucsd22, date = {05/26/2022}, day = {26}, event = {University of California San Diego}, host = {Jorge Cortés (UCSD)}, month = {05}, role = {Lecture}, title = {Learning-based Analysis and Control of Safte-Critical Systems}, url = {https://mallada.ece.jhu.edu/talks/202205-UCSD.pdf}, year = {2022} }
- 2022-05-27: Reinforcement Learning with Almost Sure Constraints, Information Theory and Applications Workshop.
[BibTeX] [Abstract] [Download PDF]
In this work, we study how to tackle decision-making for safety-critical systems under uncertainty. To that end, we formulate a Reinforcement Learning problem with Almost Sure constraints, in which one seeks a policy that allows no more than $Δınℕ$ unsafe events in any trajectory, with probability one. We argue that this type of constraint might be better suited for safety-critical systems as opposed to the usual average constraint employed in Constrained Markov Decision Processes and that, moreover, having constraints of this kind makes feasible policies much easier to find. The talk is didactically split into two parts, first considering $Δ=0$ and then the $Δ≥ 0$ case. At the core of our theory is a barrier-based decomposition of the Q-function that decouples the problems of optimality and feasibility and allows them to be learned either independently or in conjunction. We develop an algorithm for characterizing the set of all feasible policies that provably converges in expected finite time. We further develop sample-complexity bounds for learning this set with high probability. Simulations corroborate our theoretical findings and showcase how our algorithm can be wrapped around other learning algorithms to hasten the search for first feasible and then optimal policies.
@talk{ita22, abstract = {In this work, we study how to tackle decision-making for safety-critical systems under uncertainty. To that end, we formulate a Reinforcement Learning problem with Almost Sure constraints, in which one seeks a policy that allows no more than $Δınℕ$ unsafe events in any trajectory, with probability one. We argue that this type of constraint might be better suited for safety-critical systems as opposed to the usual average constraint employed in Constrained Markov Decision Processes and that, moreover, having constraints of this kind makes feasible policies much easier to find. The talk is didactically split into two parts, first considering $Δ=0$ and then the $Δ≥ 0$ case. At the core of our theory is a barrier-based decomposition of the Q-function that decouples the problems of optimality and feasibility and allows them to be learned either independently or in conjunction. We develop an algorithm for characterizing the set of all feasible policies that provably converges in expected finite time. We further develop sample-complexity bounds for learning this set with high probability. Simulations corroborate our theoretical findings and showcase how our algorithm can be wrapped around other learning algorithms to hasten the search for first feasible and then optimal policies.}, date = {05/27/2022}, day = {27}, event = {Information Theory and Applications Workshop}, host = {Christina Yu (Cornell)}, month = {05}, role = {Speaker}, title = {Reinforcement Learning with Almost Sure Constraints}, url = {http://mallada.ece.jhu/talks/202205-ITA.pdf}, year = {2022} }
- 2022-05-04: Model Free Learning of Regions of Attraction via Recurrent Sets, MURI Workshop.
[BibTeX] [Abstract] [Download PDF]
In this talk, we develop model-free methods for analyzing dynamical systems using data. Our key insight is to replace the notion of invariance, a core concept in Lyapunov Theory, with the more relaxed notion of recurrence. A set is τ-recurrent (resp. k-recurrent) if every trajectory that starts within the set returns to it after at most τ seconds (resp. k steps). We then leverage the notion of recurrence to develop several analysis tools and algorithms to study dynamical systems. We first consider the problem of learning an inner approximation of the region of attraction (ROA) of an asymptotically stable equilibrium point without an explicit model of the dynamics. We show that a τ-recurrent set containing a stable equilibrium must be a subset of its ROA under mild assumptions. We then leverage this property to develop algorithms that compute inner approximations of the ROA using counter-examples of recurrence that are obtained by sampling finite-length trajectories. Our algorithms process samples sequentially, which allows them to continue being executed even after an initial offline training stage. We will finalize by providing some recent extensions of this work that generalizes Lyapunov’s Direct Method to allow for non-decreasing functions to certify stability and illustrate future research directions.
@talk{muri22, abstract = {In this talk, we develop model-free methods for analyzing dynamical systems using data. Our key insight is to replace the notion of invariance, a core concept in Lyapunov Theory, with the more relaxed notion of recurrence. A set is τ-recurrent (resp. k-recurrent) if every trajectory that starts within the set returns to it after at most τ seconds (resp. k steps). We then leverage the notion of recurrence to develop several analysis tools and algorithms to study dynamical systems. We first consider the problem of learning an inner approximation of the region of attraction (ROA) of an asymptotically stable equilibrium point without an explicit model of the dynamics. We show that a τ-recurrent set containing a stable equilibrium must be a subset of its ROA under mild assumptions. We then leverage this property to develop algorithms that compute inner approximations of the ROA using counter-examples of recurrence that are obtained by sampling finite-length trajectories. Our algorithms process samples sequentially, which allows them to continue being executed even after an initial offline training stage. We will finalize by providing some recent extensions of this work that generalizes Lyapunov's Direct Method to allow for non-decreasing functions to certify stability and illustrate future research directions.}, date = {05/04/2022}, day = {04}, event = {MURI Workshop}, host = {Mario Sznaier (Northeastern), Necmiye Ozay (UMich)}, month = {05}, role = {Panelist}, title = {Model Free Learning of Regions of Attraction via Recurrent Sets}, url = {https://mallada.ece.jhu.edu/talks/202205-MURI.pdf}, year = {2022} }
- 2022-04-25: Embracing Low-Inertia in Power Systems: A Frequency Shaping Approach, University of California Berkeley.
[BibTeX] [Abstract] [Download PDF]
The transition of power systems from conventional synchronous generation towards renewable energy sources -with little or no inertia- is gradually threatening classical methods for achieving grid synchronization. A widely embraced approach to mitigate this problem is to mimic inertial response using grid-connected inverters. That is, to introduce virtual inertia to restore the stiffness that the system used to enjoy. In this talk, we seek to challenge this approach. We advocate taking advantage of the system’s low inertia to restore grid synchronism without incurring excessive control efforts. To this end, we develop an analysis and design framework for inverter-based frequency control. We define system-level performance metrics that are of practical relevance for power systems and systematically evaluate the performance of standard control strategies, such as virtual inertia and droop control, in the presence of power disturbances. Our analysis unveils the relatively limited role of inertia in improving performance and the inability of droop control to enhance performance without incurring considerable steady-state control effort. To overcome these limitations, we propose a novel frequency shaping control for grid-connected inverters -exploiting classical lead/lag compensation and model matching techniques from control theory- that can significantly outperform existing solutions while using comparable control effort.
@talk{berkeley22, abstract = {The transition of power systems from conventional synchronous generation towards renewable energy sources -with little or no inertia- is gradually threatening classical methods for achieving grid synchronization. A widely embraced approach to mitigate this problem is to mimic inertial response using grid-connected inverters. That is, to introduce virtual inertia to restore the stiffness that the system used to enjoy. In this talk, we seek to challenge this approach. We advocate taking advantage of the system's low inertia to restore grid synchronism without incurring excessive control efforts. To this end, we develop an analysis and design framework for inverter-based frequency control. We define system-level performance metrics that are of practical relevance for power systems and systematically evaluate the performance of standard control strategies, such as virtual inertia and droop control, in the presence of power disturbances. Our analysis unveils the relatively limited role of inertia in improving performance and the inability of droop control to enhance performance without incurring considerable steady-state control effort. To overcome these limitations, we propose a novel frequency shaping control for grid-connected inverters -exploiting classical lead/lag compensation and model matching techniques from control theory- that can significantly outperform existing solutions while using comparable control effort.}, date = {04/25/2022}, day = {25}, event = {University of California Berkeley}, host = {Murat Arcak (Berkeley)}, month = {04}, role = {Lecture}, title = {Embracing Low-Inertia in Power Systems: A Frequency Shaping Approach}, url = {https://mallada.ece.jhu.edu/talks/202204-Berkeley.pdf}, year = {2022} }
- 2022-04-11: Embracing Low Inertia for Power System Frequency Control: A Frequency Shaping Approach, ECE Seminar, University of Michigan.
[BibTeX] [Abstract] [Download PDF]
The transition of power systems from conventional synchronous generation towards renewable energy sources -with little or no inertia- is gradually threatening classical methods for achieving grid synchronization. A widely embraced approach to mitigate this problem is to mimic inertial response using grid-connected inverters. That is, to introduce virtual inertia to restore the stiffness that the system used to enjoy. In this talk, we seek to challenge this approach. We advocate taking advantage of the system’s low inertia to restore grid synchronism without incurring excessive control efforts. To this end, we develop an analysis and design framework for inverter-based frequency control. We define system-level performance metrics that are of practical relevance for power systems and systematically evaluate the performance of standard control strategies, such as virtual inertia and droop control, in the presence of power disturbances. Our analysis unveils the relatively limited role of inertia in improving performance and the inability of droop control to enhance performance without incurring considerable steady-state control effort. To overcome these limitations, we propose a novel frequency shaping control for grid-connected inverters -exploiting classical lead/lag compensation and model matching techniques from control theory- that can significantly outperform existing solutions while using comparable control effort.
@talk{umich22, abstract = {The transition of power systems from conventional synchronous generation towards renewable energy sources -with little or no inertia- is gradually threatening classical methods for achieving grid synchronization. A widely embraced approach to mitigate this problem is to mimic inertial response using grid-connected inverters. That is, to introduce virtual inertia to restore the stiffness that the system used to enjoy. In this talk, we seek to challenge this approach. We advocate taking advantage of the system's low inertia to restore grid synchronism without incurring excessive control efforts. To this end, we develop an analysis and design framework for inverter-based frequency control. We define system-level performance metrics that are of practical relevance for power systems and systematically evaluate the performance of standard control strategies, such as virtual inertia and droop control, in the presence of power disturbances. Our analysis unveils the relatively limited role of inertia in improving performance and the inability of droop control to enhance performance without incurring considerable steady-state control effort. To overcome these limitations, we propose a novel frequency shaping control for grid-connected inverters -exploiting classical lead/lag compensation and model matching techniques from control theory- that can significantly outperform existing solutions while using comparable control effort.}, date = {04/11/2022}, day = {11}, event = {ECE Seminar, University of Michigan}, host = {Johanna Mathieu}, month = {04}, role = {Lecture}, title = {Embracing Low Inertia for Power System Frequency Control: A Frequency Shaping Approach}, url = {https://mallada.ece.jhu.edu/talks/202204-UMich.pdf}, year = {2022} }
- 2022-03-30: Coherence and Concentration in Tightly-Connected Networks, Workshop on Synchronization in Complex Systems, Army Research Office.
[BibTeX] [Abstract] [Download PDF]
Achieving coordinated behavior— engineered or emergent—on networked systems has attracted widespread interest in several fields. This interest has led to remarkable advances in developing a theoretical understanding of the conditions under which agents within a network can reach an agreement (consensus) or develop coordinated behavior, such as synchronization. However, much less understood is the phenomenon of network coherence. Network coherence generally refers to nodes’ ability in a network to have a similar dynamic response despite heterogeneity in their behavior. In this talk, we present a general framework to analyze and quantify the level of network coherence that a system exhibits by relating coherence with a low-rank property. More precisely, for a networked system with linear dynamics and coupling, we show that the system transfer matrix converges to a rank-one transfer matrix representing the coherent behavior as the network connectivity grows. Interestingly, the non-zero eigenvalue of such a rank-one matrix is given by the harmonic mean of individual nodal dynamics, and we refer to it as the coherent dynamics. Our analysis unveils the frequency-dependent nature of coherence and a non-trivial interplay between dynamics and network topology. We further illustrate how this framework can be leveraged for obtaining accurate reduced-order models of coherent generators and tuning grid forming inverters to shape the coherent response of a power grid.
@talk{aro22, abstract = {Achieving coordinated behavior--- engineered or emergent---on networked systems has attracted widespread interest in several fields. This interest has led to remarkable advances in developing a theoretical understanding of the conditions under which agents within a network can reach an agreement (consensus) or develop coordinated behavior, such as synchronization. However, much less understood is the phenomenon of network coherence. Network coherence generally refers to nodes' ability in a network to have a similar dynamic response despite heterogeneity in their behavior. In this talk, we present a general framework to analyze and quantify the level of network coherence that a system exhibits by relating coherence with a low-rank property. More precisely, for a networked system with linear dynamics and coupling, we show that the system transfer matrix converges to a rank-one transfer matrix representing the coherent behavior as the network connectivity grows. Interestingly, the non-zero eigenvalue of such a rank-one matrix is given by the harmonic mean of individual nodal dynamics, and we refer to it as the coherent dynamics. Our analysis unveils the frequency-dependent nature of coherence and a non-trivial interplay between dynamics and network topology. We further illustrate how this framework can be leveraged for obtaining accurate reduced-order models of coherent generators and tuning grid forming inverters to shape the coherent response of a power grid.}, date = {03/30/2022}, day = {30}, event = {Workshop on Synchronization in Complex Systems, Army Research Office}, host = {Derya Cansever (ARO), Jorge Cortés (UCSD), Fabio Pasqualetti (UCR)}, month = {03}, role = {Speaker}, title = {Coherence and Concentration in Tightly-Connected Networks}, url = {https://mallada.ece.jhu.edu/talks/202203-ARO-Workshop.pdf}, year = {2022} }
2021
- 2021-11-03: Reinforcement Learning with Almost Sure Constraints, NSF TRIPODS PI Meeting.
[BibTeX] [Abstract] [Download PDF]
This talk aims to put forward the idea that learning to take safe actions in unknown environments (even with probability one guarantees) can be achieved without the need for an unbounded number of exploratory trials; provided that one is willing to relax its optimality requirements mildly. To this aim, we look at two settings aimed at illustrating the feasibility of this approach. We first focus on the canonical multi-armed bandit problem and seek to study the exploration-preservation trade-off intrinsic within safe learning. By defining a handicap metric that counts the number of unsafe actions, we provide an algorithm for discarding unsafe machines (or actions), with probability one, that achieves constant handicap. Our algorithm is rooted in the classical sequential probability ratio test, redefined here for continuing tasks. Under standard assumptions on sufficient exploration, our rule provably detects all unsafe machines in an (expected) finite number of rounds. The analysis also unveils a trade-off between the number of rounds needed to secure the environment and the probability of discarding safe machines. We then study the problem of learning safe policies in the context of model-free constrained Markov decision processes. We propose the use of hard penalties/damage information, as a complement for rewards, that can be used to learn which actions lead to constraint violations. We show that such penalties naturally arise from a separation principle that decomposes the value and action-value functions into a reward component, and feasibility component–represented by a hard barrier function. We further develop an adaptive algorithm for learning this \emphbarrier function, which incorporates the damage information and gradually reveals the safety constraints. In the process of learning such a barrier function, the policy is adapted so as to avoid “bumping to the same rock twice”. Both algorithms can wrap around any other algorithm to optimize a specific auxiliary goal as they provide a safe environment to search for (approximately) optimal policies.
@talk{tripods21, abstract = {This talk aims to put forward the idea that learning to take safe actions in unknown environments (even with probability one guarantees) can be achieved without the need for an unbounded number of exploratory trials; provided that one is willing to relax its optimality requirements mildly. To this aim, we look at two settings aimed at illustrating the feasibility of this approach. We first focus on the canonical multi-armed bandit problem and seek to study the exploration-preservation trade-off intrinsic within safe learning. By defining a handicap metric that counts the number of unsafe actions, we provide an algorithm for discarding unsafe machines (or actions), with probability one, that achieves constant handicap. Our algorithm is rooted in the classical sequential probability ratio test, redefined here for continuing tasks. Under standard assumptions on sufficient exploration, our rule provably detects all unsafe machines in an (expected) finite number of rounds. The analysis also unveils a trade-off between the number of rounds needed to secure the environment and the probability of discarding safe machines. We then study the problem of learning safe policies in the context of model-free constrained Markov decision processes. We propose the use of hard penalties/damage information, as a complement for rewards, that can be used to learn which actions lead to constraint violations. We show that such penalties naturally arise from a separation principle that decomposes the value and action-value functions into a reward component, and feasibility component--represented by a hard barrier function. We further develop an adaptive algorithm for learning this \emphbarrier function, which incorporates the damage information and gradually reveals the safety constraints. In the process of learning such a barrier function, the policy is adapted so as to avoid ``bumping to the same rock twice''. Both algorithms can wrap around any other algorithm to optimize a specific auxiliary goal as they provide a safe environment to search for (approximately) optimal policies.}, date = {11/03/2021}, day = {03}, event = {NSF TRIPODS PI Meeting}, host = {Maryam Fazel (UW), Rene Vidal (JHU)}, month = {11}, role = {Speaker}, title = {Reinforcement Learning with Almost Sure Constraints}, url = {https://mallada.ece.jhu.edu/talks/202111-TRIPODS.pdf}, year = {2021} }
- 2021-10-27: Coherence and Concentration in Tightly Connected Networks, Data-based Diagnosis of Networked Dynamical Systems, CCS 2021 Satellite Symposium.
[BibTeX] [Download PDF]@talk{css21, date = {10/27/2021}, day = {27}, event = {Data-based Diagnosis of Networked Dynamical Systems, CCS 2021 Satellite Symposium}, host = {Melvyn Tyloo, Laurent Pagnier, Robinn Delabays}, month = {10}, role = {Speaker}, title = {Coherence and Concentration in Tightly Connected Networks}, url = {https://mallada.ece.jhu.edu/talks/202110-CSS.pdf}, year = {2021} }
- 2021-09-09: Coherence and Concentration in Tightly Connected Networks, Resilient Autonomous Energy Systems Workshop, National Renewable Energy Laboratory.
[BibTeX] [Download PDF]@talk{nrel21, date = {09/09/2021}, day = {09}, event = {Resilient Autonomous Energy Systems Workshop, National Renewable Energy Laboratory}, host = {Andrey Berstein (NREL), Bai Cui (NREL)}, month = {09}, role = {Speaker}, title = {Coherence and Concentration in Tightly Connected Networks}, url = {https://mallada.ece.jhu.edu/talks/202109-NREL.pdf}, year = {2021} }
- 2021-04-13: Incentive Analysis and Coordination Design for Multi-Timescale Electricity Markets, Epstein Institute Seminar, University of Southern California.
[BibTeX] [Abstract] [Download PDF]
This talk discusses incentives and coordination requirements that arise when heterogeneous participants bid in electricity markets that operate at different timescales. First, we consider the conventional timescales of market clearing, spanning 5 minutes to several hours ahead, and investigate the incentives for price manipulation that market participants (generators and loads) have in a two-stage settlement market. Our analysis unveils the importance of accounting for both generators’ and loads’ strategic behavior in two-stage markets, even when the consumers’ demand is inelastic! Precisely, we show that loads can exploit generators’ strategic bidding and maintain a systematic difference between the forward and spot prices, the latter being higher than the former. Such a strategy does bring down demand-side payments and undermines supply-side market power. Second, we consider the problem of co-optimizing generation resources with different timescale characteristics. To that end, we frame and study a joint problem that optimizes both slow-timescale economic dispatch resources and fast-timescale frequency regulation resources. We provide sufficient conditions to optimally decompose the joint problem into slow and fast timescale problems. These slow and fast timescale problems have appealing interpretations as the economic dispatch and frequency regulation problems, respectively. We further provide a market implementation for the fast-timescale problem. In this implementation, participants receive prices and dispatch and dynamically update their bids according to either a dynamic gradient play or best response. Under price-taking assumptions, our market implementation is guaranteed to converge to the optimal (efficient) allocation even in the presence of generator dynamics. A by-product of this solution is that frequency restoration and thermal limits are automatically guaranteed.
@talk{epstein21, abstract = {This talk discusses incentives and coordination requirements that arise when heterogeneous participants bid in electricity markets that operate at different timescales. First, we consider the conventional timescales of market clearing, spanning 5 minutes to several hours ahead, and investigate the incentives for price manipulation that market participants (generators and loads) have in a two-stage settlement market. Our analysis unveils the importance of accounting for both generators' and loads' strategic behavior in two-stage markets, even when the consumers' demand is inelastic! Precisely, we show that loads can exploit generators' strategic bidding and maintain a systematic difference between the forward and spot prices, the latter being higher than the former. Such a strategy does bring down demand-side payments and undermines supply-side market power. Second, we consider the problem of co-optimizing generation resources with different timescale characteristics. To that end, we frame and study a joint problem that optimizes both slow-timescale economic dispatch resources and fast-timescale frequency regulation resources. We provide sufficient conditions to optimally decompose the joint problem into slow and fast timescale problems. These slow and fast timescale problems have appealing interpretations as the economic dispatch and frequency regulation problems, respectively. We further provide a market implementation for the fast-timescale problem. In this implementation, participants receive prices and dispatch and dynamically update their bids according to either a dynamic gradient play or best response. Under price-taking assumptions, our market implementation is guaranteed to converge to the optimal (efficient) allocation even in the presence of generator dynamics. A by-product of this solution is that frequency restoration and thermal limits are automatically guaranteed.}, date = {04/13/2021}, day = {13}, event = {Epstein Institute Seminar, University of Southern California}, host = {Jong-Shi Pang (USC), Suvrajeet Sen (USC)}, month = {04}, role = {Speaker}, title = {Incentive Analysis and Coordination Design for Multi-Timescale Electricity Markets}, url = {https://mallada.ece.jhu.edu/talks/202104-Epstein.pdf}, year = {2021} }