A Survey of Long-Term Context in Transformers
Over the past two years the NLP community has developed a veritable zoo of methods to combat expensive multi-head self-attention.
transformers multi-head-attention attention natural-language-processing tutorial
Objectives & Highlights

In this post we'll focus on six promising approaches: • Sparse Transformers • Adaptive Span Transformers • Transformer-XL • Compressive Transformers • Reformer • Routing Transformer

