Oracle-Efficient Adversarial Reinforcement Learning via Max-Following

Abstract

We extend the max-following framework to regret minimization under adversarial initial states and limited feedback. The max-following approach involves selecting the policy with the highest estimated value function from a base class at each state. Our algorithm is oracle-efficient, guarantees no-regret with respect to the base class and the worst approximate max-following policy, and does not depend on the size of the state or action space. It also achieves the optimal rate in terms of the number of episodes.

Publication
The Exploration in AI Today Workshop at ICML 2025
Zak Mhammedi
Zak Mhammedi
Research Scientist

I work on the theoretical foundations of Reinforcement Learning, Controls, and Optimization.