The Transformer … “Explained”?
An intuitive explanation of the Transformer by motivating it through the lens of CNNs, RNNs, etc.
transformers natural-language-processing article convolutional-neural-networks recurrent-neural-networks tutorial

  • I’m going to take a “historical” route where I go through some other, mostly older architectural patterns first, to put it in context; hopefully it’ll be useful to people who are new to this stuff, while also not too tiresome to those who aren’t.
  • The closest thing to an intuitive explainer than I know of is “The Illustrated Transformer,” but IMO it’s too light on intuition and too heavy on near-pseudocode (including stuff like “now you divide by 8,” as the third of six enumerated “steps” which themselves only cover part of the whole computation!).
  • This is a shame, because once you hack through all the surrounding weeds, the basic idea of the Transformer is really simple. This post is my attempt at a explainer.

Don't forget to tag @nostalgebraist in your comment, otherwise they may not be notified.

Authors community post
Share this project
Similar projects
Fine-tune a non-English GPT-2 Model with Huggingface
In this tutorial, we are going to use the transformers library by Huggingface. We will use the new Trainer class and fine-tune out GPT-2 model.
BERTology Meets Biology
Interpreting Attention in Protein Language Models.
DETR: End-to-End Object Detection with Transformers
A new method that views object detection as a direct set prediction problem.
Haystack — Neural Question Answering At Scale
🔍 Transformers at scale for question answering & search
Top collections