Replicating Airbnb's Amenity Detection (documentary series)
Airbnb's engineering team shared an article on how they used computer vision to detection amenities in photos. It read like a recipe so I replicated it.
computer-vision project-management detectron2 business tutorial
Resource links
Top collections
Details
Objectives & Highlights

The goal: beat (or at least replicate) Airbnb's amenity detection (detecting key household items in images), publish all the code and have the model accessible in a demo app someone can use on their phone. The full solution ended up being: data collected from Open Images, modelled with Detectron2, front-end application built with Streamlit and deployed using Docker, Google Container Registry and Google App Engine. I documented the entire journey day-by-day in Notion along with weekly YouTube videos discussing progress, open-sourced all code and built a tutorial in Colab where you can use my trained model (see the links).

Takeaways & Next Steps

Note to self: You learn the most working on your own projects. Modelling is the easy part, collecting data and getting your application live to users is the challenging part. Detectron2 is a powerful beast for computer vision tasks but may be overkill as a standalone choice of modelling platform. It's size led to difficulties when deploying. Next time, I'll start as simple as possible, adding complexity when needed.

Don't forget to tag @mrdbourke in your comment.

Authors original post
Machine Learning Engineer live on YouTube. Self-taught via https://dbourke.link/aimastersdegree
Share this project
Similar projects
U^2-Net
The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection."
A Visual Guide to Self-Labelling Images
A self-supervised method to generate labels via simultaneous clustering and representation learning
The Illustrated FixMatch for Semi-Supervised Learning
Learn how to leverage unlabeled data using FixMatch for semi-supervised learning
TransMoMo: Invariance-Driven Unsupervised Motion Retargeting
A lightweight video motion retargeting approach that is capable of transferring motion of a person in a source video realistically to another video of a ...