With trl you can train transformer language models with Proximal Policy Optimization (PPO). The library is built with the transformer library by 🤗 Hugging Face (link). Therefore, pre-trained language models can be directly loaded via the transformer interface. At this point only GTP2 is implemented.
Don't forget to tag @lvwerra in your comment, otherwise they may not be notified.