Since the Transformer architecture facilitates more parallelization during training computations, it has enabled training on much more data than was possible before it was introduced. This led to the development of pretrained systems such as BERT (Bidirectional Encoder Representations from Transformers) and GPT-2, which have been trained with huge amounts of general language data prior to being released, and can then be fine-tune trained to specific language tasks.


An Introduction to Transfer Learning and HuggingFace
In this talk I'll start by introducing the recent breakthroughs in NLP that resulted from the combination of Transfer Learning schemes and Transformer ...
transfer-learning transformers huggingface video
The Illustrated Transformer
In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained.
transformers positional-encoding encoder decoder
NLP for Developers: Transformers | Rasa
In this video, Rasa Developer Advocate Rachael will talk about what transformers are, how they work, when they're used and some common errors.
transformers rasa natural-language-processing attention
The Transformer Neural Network Architecture Explained
⚙️ It is time to explain how Transformers work. If you are looking for an easy explanation, you are exactly right!
transformers self-attention positional-encoding natural-language-processing


Top Down Introduction to BERT with HuggingFace and PyTorch
I will also provide some intuition into how BERT works with a top down approach (applications to algorithm).
bert top-down huggingface pytorch
PyTorch Transformers Tutorials
A set of annotated Jupyter notebooks, that give user a template to fine-tune transformers model to downstream NLP tasks such as classification, NER etc.
transformers text-classification text-summarization named-entity-recognition
The Annotated Transformer
In this post I present an “annotated” version of the paper in the form of a line-by-line implementation.
transformers attention natural-language-processing annotated
Transformers from Scratch
Attempt to explain directly how modern transformers work, and why, without some of the historical baggage.
transformers from-scratch natural-language-processing tutorial


Transformers - Hugging Face
🤗 Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.
transformers huggingface attention bert
Finetune: Scikit-learn Style Model Finetuning for NLP
Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide variety of downstream tasks.
natural-language-processing finetuning pretraining transformers
Pretraining for Joint Understanding of Textual and Tabular Data
bert pretraining natural-language-processing tabular-data
Linear Attention Transformer
A fully featured Transformer that mixes (QKᵀ)V local attention with Q(KᵀV) global attention (scales linearly with respect to sequence length).
transformers linear linear-attention linear-attention-transformer
Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)
interpretability visualization bert attention
Table of Contents
Share a project
Share something you or the community has made with ML.
Topic experts