Sample and Oracle Efficient Reinforcement Learning for MDPs with Linearly-Realizable Value Functions

September 2024

Abstract

Designing sample-efficient and computationally feasible reinforcement learning (RL) algorithms is particularly challenging in environments with large or infinite state and action spaces. In this paper, we advance this effort by presenting an efficient algorithm for Markov Decision Processes (MDPs) where the state-action value function of any policy is linear in a given feature map. This challenging setting can model environments with infinite states and actions, strictly generalizes classic linear MDPs, and currently lacks a computationally efficient algorithm under online access to the MDP. Specifically, we introduce a new RL algorithm that efficiently finds a near-optimal policy in this setting, using a number of episodes and calls to a cost-sensitive classification (CSC) oracle that are both polynomial in the problem parameters.

Type

Journal article

Publication

arXiv preprint arXiv:2409.04840

Reinforcement Learning Computational-Efficiency

Sample and Oracle Efficient Reinforcement Learning for MDPs with Linearly-Realizable Value Functions

Abstract

Zak Mhammedi

Research Scientist