When and Why Certain Forms of Reinforcement Learning Perform Better than Others
In reinforcement learning (RL), an area of machine learning, automated agents are trained by learning algorithms to interact with their environment. For example, RL methods might be used to train autonomous vehicles to maneuver in traffic to reach a destination. While visiting the SequeL group at Inria Lille–Nord Europe, I will study the theoretical properties of a class of problems called multi-armed bandits, which capture the fundamental challenge of RL—the trade-off between exploiting options already known to be good and exploring uncharted territory in search of potentially larger reward—in a manner more amenable to theoretical analysis. This will contribute to a rigorous understanding of when and why certain RL methods perform better than others, as well as how the nature of the application at hand can affect which method is preferable. By working towards a strong theoretical foundation for RL algorithms, this research will guide future engineers in choosing appropriate methods for their application and provide provable guarantees for when RL is used in safety-critical contexts.