A Survey of Long-Term Context in Transformers
Over the past two years the NLP community has developed a veritable zoo of methods to combat expensive multi-head self-attention.
transformers multi-head-attention attention natural-language-processing tutorial
Objectives & Highlights

In this post we'll focus on six promising approaches: • Sparse Transformers • Adaptive Span Transformers • Transformer-XL • Compressive Transformers • Reformer • Routing Transformer

Don't forget to add the tag @madisonmay in your comments.

Machine Learning Architect at @IndicoDataSolutions
Share this project
Similar projects
Talking-Heads Attention
A variation on multi-head attention which includes linear projections across the attention-heads dimension, immediately before and after the softmax ...
Using Different Decoding Methods for LM with Transformers
A look at different decoding methods for generate subsequent tokens in language modeling.
Custom Classifier on Top of Bert-like Language Model
Take pre-trained language model and build custom classifier on top of it.
Transformer OCR
Rectification-free OCR using spatial attention from Transformers.