ICLR 2020 Trends: Better & Faster Transformers for NLP
A summary of promising directions from ICLR 2020 for better and faster pretrained tranformers language models.
transformers self-attention language-modeling natural-language-understanding deep-learning natural-language-processing attention research article

Summarize and categorize various approaches introduced in papers presented at the ICLR 2020 conference to improve the Transformer architecture applied to natural language processing.

Don't forget to tag @gsarti in your comment, otherwise they may not be notified.

Authors original post
NLP Research Intern at CNR-ILC, MSc Candidate in Data Science at UniTrieste & SISSA
Share this project
Similar projects
Linformer: Self-Attention with Linear Complexity
We demonstrate that the self-attention mechanism can be approximated by a low-rank matrix.
The Dark Secrets of BERT
How much of the linguistically interpretable self-attention patterns that are presumed to be its strength are actually used to solve downstream tasks?
Attention? Attention!
In this post, we are gonna look into how attention was invented, and various attention mechanisms and models, such as transformer and SNAIL.
The Illustrated Transformer
In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained.