Is a Good Foundation Necessary for Efficient Reinforcement Learning? The Computational Role of the Base Model in Exploration

Dylan J. Foster, Zak Mhammedi, Dhruv Rohatgi

March 2025

Abstract

Language model alignment techniques that leverage active exploration offer the promise of super-human capabilities. However, current understanding of algorithm design primitives for computationally efficient exploration with language models is limited. We introduce a new computational framework for RL with language models, in which the learner interacts with the model through a sampling oracle. Focusing on the linear softmax model parameterization, we provide new results that reveal the computational-statistical tradeoffs of efficient exploration, including the necessity of coverage, a new inference-time exploration algorithm (SpannerSampling), the insufficiency of training-time interventions, and the computational benefits of multi-turn exploration.

Type

Journal article

Publication

arXiv preprint arXiv:2503.07453

Reinforcement Learning Language Models

Is a Good Foundation Necessary for Efficient Reinforcement Learning? The Computational Role of the Base Model in Exploration

Abstract

Zak Mhammedi

Research Scientist