Fully Unconstrained Online Learning

Abstract

We present an online learning algorithm that achieves a regret bound of G||w*||sqrt(T log(||w*||G sqrt(T))) + ||w*||^2 + G^2 for G-Lipschitz convex losses, without any prior knowledge of G (the Lipschitz constant) or ||w*|| (the norm of the comparison point). This result matches the optimal bound G||w*||sqrt(T) that would be available with such knowledge, up to logarithmic factors. Our algorithm covers all “interesting” scenarios where sublinear regret can be achieved.

Publication
NeurIPS 2024
Zak Mhammedi
Zak Mhammedi
Research Scientist

I work on the theoretical foundations of Reinforcement Learning, Controls, and Optimization.