Since the Transformer architecture facilitates more parallelization during training computations, it has enabled training on much more data than was possible before it was introduced. This led to the development of pretrained systems such as BERT (Bidirectional Encoder Representations from Transformers) and GPT-2, which have been trained with huge amounts of general language data prior to being released, and can then be fine-tune trained to specific language tasks.


An Introduction to Transfer Learning and HuggingFace
In this talk I'll start by introducing the recent breakthroughs in NLP that resulted from the combination of Transfer Learning schemes and Transformer ...
transfer-learning transformers huggingface video
The Illustrated Transformer
In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained.
transformers positional-encoding encoder decoder
NLP for Developers: Transformers | Rasa
In this video, Rasa Developer Advocate Rachael will talk about what transformers are, how they work, when they're used and some common errors.
transformers rasa natural-language-processing attention
The Transformer Neural Network Architecture Explained
⚙️ It is time to explain how Transformers work. If you are looking for an easy explanation, you are exactly right!
transformers self-attention positional-encoding natural-language-processing
Top Down Introduction to BERT with HuggingFace and PyTorch
I will also provide some intuition into how BERT works with a top down approach (applications to algorithm).
bert top-down huggingface pytorch
Evolution of Representations in the Transformer
The evolution of representations of individual tokens in Transformers trained with different training objectives (MT, LM, MLM - BERT-style).
transformers representation-learning representations natural-language-processing


PyTorch Transformers Tutorials
A set of annotated Jupyter notebooks, that give user a template to fine-tune transformers model to downstream NLP tasks such as classification, NER etc.
transformers text-classification text-summarization named-entity-recognition
Transformers from Scratch
Attempt to explain directly how modern transformers work, and why, without some of the historical baggage.
transformers from-scratch natural-language-processing article
Tensorflow, Pytorch, Transformer, Fastai, etc. Tutorials
BERT Classification, Question Answering, Seq2Seq Machine Translation, Contextual Topic Modeling, Large Scale Multilabelclassification, etc
transformers text-classification pytorch tensorflow
The Annotated Transformer
In this post I present an “annotated” version of the paper in the form of a line-by-line implementation.
transformers attention natural-language-processing annotated
Fine-tuning with custom datasets
This tutorial will take you through several examples of using 🤗 Transformers models with your own datasets.
transformers huggingface fine-tuning custom-datasets


Transformers - Hugging Face
🤗 Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.
transformers huggingface attention bert
Simple Transformers
Transformers for Classification, NER, QA, Language Modeling, Language Generation, T5, Multi-Modal, and Conversational AI.
transformers named-entity-recognition question-answering language-modeling
Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX.
onnx pytorch model-serving transformers
Finetune: Scikit-learn Style Model Finetuning for NLP
Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide variety of downstream tasks.
natural-language-processing finetuning pretraining transformers
Pretraining for Joint Understanding of Textual and Tabular Data
bert pretraining natural-language-processing tabular-data
Linear Attention Transformer
A fully featured Transformer that mixes (QKᵀ)V local attention with Q(KᵀV) global attention (scales linearly with respect to sequence length).
transformers linear linear-attention linear-attention-transformer
Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)
interpretability visualization bert attention
ONNX Transformers
Accelerated NLP pipelines for fast inference 🚀 on CPU. Built with 🤗 Transformers and ONNX runtime.
natural-language-processing onnx inference transformers
A fast and user-friendly runtime for transformer inference on CPU and GPU.
transformers inference turbotransformers natural-language-processing
Hugdatafast: huggingface/nlp + fastai
The elegant integration of huggingface/nlp and fastai2 and handy transforms using pure huggingface/nlp
natural-language-processing dataset fastai huggingface
Making BERT stretchy. Semantic Elasticsearch with Sentence Transformers.
transformers search elastic-search huggingface
Electra_pytorch: ELECTRA in PyTorch (fastai + huggingface)
Unofficial reimplementation of <ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators>
natural-language-processing pretraining deep-learning fastai
Table of Contents
Share a project
Share something you or the community has made with ML.
Topic experts