Oracle-Efficient Adversarial Reinforcement Learning via Max-Following

Sikata Bela Sengupta, Zak Mhammedi, T. V. Marinov

July 2025

Abstract

We extend the max-following framework to regret minimization under adversarial initial states and limited feedback. The max-following approach involves selecting the policy with the highest estimated value function from a base class at each state. Our algorithm is oracle-efficient, guarantees no-regret with respect to the base class and the worst approximate max-following policy, and does not depend on the size of the state or action space. It also achieves the optimal rate in terms of the number of episodes.

Type

Journal article

Publication

The Exploration in AI Today Workshop at ICML 2025

Oracle-Efficient Adversarial Reinforcement Learning via Max-Following

Abstract

Zak Mhammedi

Research Scientist