Top Down Introduction to BERT with HuggingFace and PyTorch
I will also provide some intuition into how BERT works with a top down approach (applications to algorithm).
bert top-down huggingface pytorch
PyTorch Transformers Tutorials
A set of annotated Jupyter notebooks, that give user a template to fine-tune transformers model to downstream NLP tasks such as classification, NER etc.
transformers text-classification text-summarization named-entity-recognition
VirTex: Learning Visual Representations from Textual Annotations
We train CNN+Transformer from scratch from COCO, transfer the CNN to 6 downstream vision tasks, and exceed ImageNet features despite using 10x fewer ...
convolutional-neural-networks transformers coco visual-representations
The Transformer Family
This post presents how the vanilla Transformer can be improved for longer-term attention span, less memory and computation consumption, RL task solving, ...
attention transformers natural-language-processing article
The Illustrated Transformer
In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained.
transformers positional-encoding encoder decoder
Transformers from Scratch
Attempt to explain directly how modern transformers work, and why, without some of the historical baggage.
transformers from-scratch natural-language-processing article
RoBERTa meets TPUs
Understanding and applying the RoBERTa model to the current challenge.
roberta transformers tpu huggingface
Linformer: Self-Attention with Linear Complexity
We demonstrate that the self-attention mechanism can be approximated by a low-rank matrix.
self-attention attention transformers linear-complexity
The Annotated Transformer
In this post I present an “annotated” version of the paper in the form of a line-by-line implementation.
transformers attention natural-language-processing annotated
