A Survey of Long-Term Context in Transformers
Over the past two years the NLP community has developed a veritable zoo of methods to combat expensive multi-head self-attention.
transformers multi-head-attention attention natural-language-processing tutorial article

In this post we'll focus on six promising approaches: • Sparse Transformers • Adaptive Span Transformers • Transformer-XL • Compressive Transformers • Reformer • Routing Transformer

Don't forget to tag @madisonmay in your comment, otherwise they may not be notified.

Machine Learning Architect at @IndicoDataSolutions
Share this project
Similar projects
Talking-Heads Attention
A variation on multi-head attention which includes linear projections across the attention-heads dimension, immediately before and after the softmax ...
Tensorflow, Pytorch, Transformer, Fastai, etc. Tutorials
BERT Classification, Question Answering, Seq2Seq Machine Translation, Contextual Topic Modeling, Large Scale Multilabelclassification, etc
The Transformer … “Explained”?
An intuitive explanation of the Transformer by motivating it through the lens of CNNs, RNNs, etc.
NLP Model Selection
NLP model selection guide to make it easier to select models. This is prescriptive in nature and has to be used with caution.
Top collections