Reliable Reinforcement Learning with Function Approximation: Restoring Monotonic Policy Improvement

Reliability challenges of reinforcement learning under function approximation, revealing how popular algorithms such as DQN can converge to highly sub-optimal policies. To address this, the work introduces Reliable Policy Iteration (RPI), the first function-approximation-based framework with provable monotonic improvement and convergence guarantees. A practical model-free variant integrates seamlessly with DQN and DDPG, achieving robust and competitive performance on standard control benchmarks while ensuring reliable policy evaluation and improvement. 

Faculty: Gugan Thoppe

Click image to view enlarged version

Scroll Up