Deep Tutorials for PyTorch
This is a series of in-depth tutorials I'm writing for implementing cool deep learning models on your own with the amazing PyTorch library.
VirTex: Learning Visual Representations from Textual Annotations
We train CNN+Transformer from scratch from COCO, transfer the CNN to 6 downstream vision tasks, and exceed ImageNet features despite using 10x fewer ...
Hugging Captions
Generate realistic instagram worthy captions using transformers given a hasthtag and a small text snippet.
Image Caption Generation
Image Caption Generation is a challenging task where a textual description is generated given a picture. It needs both methods from Computer Vision and ...
Video object grounding
Video object grounding using semantic roles in language description.
Lecture 10 | Recurrent Neural Networks
Discuss the use of recurrent neural networks for modeling sequence data.
ViLBERT-MT: Multi-Task Vision & Language Representation Learning
A single ViLBERT Multi-Task model can perform 8 different vision and language tasks learnt from 12 datasets!
Show, Infer & Tell: Contextual Inference for Creative Captioning
The beauty of the work lies in the way it architects the fundamental idea that humans look at the overall image and then individual pieces of it.
