We extend the max-following framework to regret minimization under adversarial initial states and limited feedback. The max-following approach involves selecting the policy with the highest estimated value function from a base class at each state. Our algorithm is oracle-efficient, guarantees no-regret with respect to the base class and the worst approximate max-following policy, and does not depend on the size of the state or action space. It also achieves the optimal rate in terms of the number of episodes.