Rlx: A modular Deep RL library for research
"rlx" is a Deep RL library written on top of PyTorch & built for educational and research purpose.
reinforcement-learning deep-learning article code pytorch library

rlx is a Deep RL library written on top of PyTorch & built for educational and research purpose. Majority of the libraries/codebases for Deep RL are geared more towards reproduction of state-of-the-art algorithms on very specific tasks (e.g. Atari games etc.), but rlx is NOT. It is supposed to be more expressive and modular. Rather than making RL algorithms as black-boxes, rlx adopts an API that tries to expose more granular operation to the users which makes writing new algorithms easier. It is also useful for implementing task specific engineering into a known algorithm.

Concisely, rlx is supposed to

  • Be generic (i.e., can be adopted for any task at hand)
  • Have modular lower-level components exposed to users
  • Be easy to implement new algorithms

Here's a basic example of PPO (with clipping) implementation with rlx

base_rollout = agent(policy).episode(horizon) # sample an episode as a 'Rollout' object
base_rewards, base_logprobs = base_rollout.rewards, base_rollout.logprobs # 'rewards' and 'logprobs' for all timesteps
base_returns = base_rollout.mc_returns() # Monte-carlo estimates of 'returns'

for _ in range(k_epochs):
    rollout = agent(policy).evaluate(base_rollout) # 'evaluate' an episode against a policy and get a new 'Rollout' object
    logprobs, entropy = rollout.logprobs, rollout.entropy # get 'logprobs' and 'entropy' for all timesteps
    values, = rollout.others # .. also 'value' estimates

    ratios = (logprobs - base_logprobs.detach()).exp()
    advantage = base_returns - values
    policyloss = - torch.min(ratios, torch.clamp(ratios, 1 - clip, 1 + clip)) * advantage.detach()
    valueloss = advantage.pow(2)
    loss = policyloss.sum() + 0.5 * valueloss.sum() - entropy.sum() * 0.01

    agent.zero_grad()
    loss.backward()
    agent.step()

Visit the README for further details.

Don't forget to tag @dasayan05 in your comment, otherwise they may not be notified.

Authors original post
Ph.D Student @ University of Surrey. Deep Learning Enthusiast. Student Ambassador @ Intel AI Academy
Share this project
Similar projects
Neural Architecture Search
A look at neural architecture search w.r.t search space, search algorithms and evolution strategies.
Meta Reinforcement Learning
Explore cases when we try to “meta-learn” Reinforcement Learning (RL) tasks by developing an agent that can solve unseen tasks fast and efficiently.
Deep Reinforcement Learning Amidst Lifelong Non-Stationarity
How can robots learn in changing, open-world environments? We introduce dynamic-parameter MDPs, to capture environments with persistent, unobserved ...
Exploration Strategies in Deep Reinforcement Learning
Exploitation versus exploration is a critical topic in reinforcement learning. This post introduces several common approaches for better exploration in ...