In this paper, we apply a model-based reinforcement learning (MBRL) method for predictive Adaptive Optics control. Our goal is to demonstrate how reinforcement learning algorithms can reduce the AO temporal error, and improve the overall performance. Our experiments are based on the use of the Object-Oriented Python Adaptive Optics (OOPAO) to simulate The Provence Adaptive Optics Pyramid Run System (PAPYRUS) optical bench and provide a real-time model of the optical system. This is of particular importance for PAPYRUS, where the temporal error is the main contributor in the total error budget. We first present a detailed description of the reinforcement learning framework, including the definition of the state space, the action space, and the reward function. The state space is represented by the wavefront measurements obtained from the OOPAO simulation, while the action space corresponds to the DM commands that can be applied to the system. The reward function is defined based on the wavefront error. The experiment section shows the results obtained by simulating the bench with a classical integrator in comparison to the same system run with the MBRL approach. We also provide results of the actual bench under a calibration source for a better contrast with the simulated bench. In conclusion, we present how running machine learning methods on a simulated bench can be beneficial to study and understand its implementation before applying it to the RTC of a real bench. Reinforcement learning methods have the potential to optimize the performance of adaptive optics systems by predicting the evolution of turbulence and learning better DM commands, and eventually improve the accuracy and efficiency of adaptive optics control.