One of the main objectives of the next generation of ground-based telescopes is to directly image Earth-like exoplanets. However, identifying these exoplanets can be challenging as they are located very close to their host stars. To overcome the challenge, a careful design of the adaptive optics (AO) system's control algorithm is necessary.
Recently, there has been an emerging interest in improving AO control using data-driven methods such as Reinforcement Learning (RL), a subfield of machine learning where the control of a system is learned through interaction with the environment. In particular model-based RL enables an automated, self-tuning control for AO. It can handle temporal and misregistration errors and adapt to non-linear wavefront sensing while remaining efficient in training and execution.
In this study, we apply and adapt a specific RL method called Policy Optimizations for AO (PO4AO) to the GHOST test bench at ESO headquarters, where we demonstrate strong performance on a simulated cascaded AO system. We explore the predictive and self-calibrating capabilities of the method and show that our current implementation using PyTorch introduces only a latency of 300$\mu$m. We also discuss and introduce the open-source implementation of the method.