The memory improvements can be attributed to 4 features which the Reformer authors introduced to the transformer world:
The goal of this blog post is to give the reader an in-depth understanding of each of the four Reformer features mentioned above. While the explanations are focused on the Reformer, the reader should get a better intuition under which circumstances each of the four features can be effective for other transformer models as well. The four sections are only loosely connected, so they can very well be read individually.
Don't forget to tag @patrickvonplaten in your comment.