Language model alignment techniques that leverage active exploration offer the promise of super-human capabilities. However, current understanding of algorithm design primitives for computationally efficient exploration with language models is limited. We introduce a new computational framework for RL with language models, in which the learner interacts with the model through a sampling oracle. Focusing on the linear softmax model parameterization, we provide new results that reveal the computational-statistical tradeoffs of efficient exploration, including the necessity of coverage, a new inference-time exploration algorithm (SpannerSampling), the insufficiency of training-time interventions, and the computational benefits of multi-turn exploration.