High-contrast imaging instruments are today primarily limited by non-common path aberrations appearing between the scientific and wavefront sensing arms. These aberrations can produce quasi-static speckles in science images that are difficult to distinguish from exoplanet signatures. With the help of recent advances in deep learning, we have implemented convolutional neural networks (CNN) to estimate pupil-plane phase aberrations from point spread functions (PSF). The approach we propose here is to introduce into the deep learning architecture a differentiable simulator of the instrument. To do so, we create an autoencoder-like architecture, with a deep CNN as the encoder and the simulator as the decoder, while the latent space represents phase aberrations. Because this unsupervised learning approach reconstructs the PSFs, it is not required to know the true wavefront aberrations in order to train the models. Following our earlier work based only on simulated data, we now assess how our method performs on laboratory and on-sky data using the SCExAO instrument installed at the Subaru Telescope. Our simulator-based autoencoder approach is particularly motivated for on-sky applications because the ground truth is not available in this case. By taking into account prediction uncertainties and prior information, we also apply a variational inference approach and investigate how it can improve the robustness of the models.