In each tutorial, we will focus on a specific application or area of interest by implementing a model from a research paper.

Hugging Captions
Generate realistic instagram worthy captions using transformers given a hasthtag and a small text snippet.
VirTex: Learning Visual Representations from Textual Annotations
We train CNN+Transformer from scratch from COCO, transfer the CNN to 6 downstream vision tasks, and exceed ImageNet features despite using 10x fewer ...
ViLBERT-MT: Multi-Task Vision & Language Representation Learning
A single ViLBERT Multi-Task model can perform 8 different vision and language tasks learnt from 12 datasets!
Image Caption Generation
Image Caption Generation is a challenging task where a textual description is generated given a picture. It needs both methods from Computer Vision and ...
Top collections