Overview

What is it?

The MWML Data Science Incubator is a non-profit and community-led effort to create a meaningful summer experience for students and aspiring data scientists who lost their jobs or internships. Through this incubator, our team hopes to create a platform for anyone to learn about data science and build end-to-end projects with their peers.

Why should you participate?
  • Learn: ML with your team, mentors and the larger community through our ML topics video series, keynote speakers and webinars/Q&A sessions by data science thought leaders.
  • Build: an end-to-end data science project either with a team or go solo.
  • Share: your project with the community and add it to your portfolio.
  • Receive: a certificate of completion and prizes from sponsoring companies on demo day.
Who can join?
  • Anyone who wants to learn and work on a project this summer.
  • Typical time commitment will be at least 10 hrs a week (depends on your team).
  • Focused on university students who lost their internships / job offers due to COVID-19.
What is the duration?

The official duration is from Saturday, June 14th, 2020 to Saturday, September 26, 2020 (Demo Day!). You can join the incubator at any point in time but if you join late, you'll still access to all the resources but you may have to work on your own and may not finish your project in time (but still a great learning experience).

📝 TODO

You must complete all of the quick tasks below before starting on your project. Otherwise we won't be able to help you or send resources along the way.

📆 Timeline

  1. Complete the steps in the ToDo section.
  2. Saturday, June 14th, 2020: Official project kick off day! You can still join the incubator anytime but the Demo Day date won't change.
  3. Collaborate on code, ideas, etc. via our Slack channel.
  4. Learn from our guided ML video series, keynote speakers and webinars/Q&A sessions by data science experts.
  5. The Made With ML community will be upvoting and giving feedback on projects.
  6. Saturday, September 26, 2020: Demo day (presentations, prizes, etc.) with our sponsors.
  7. Industry experts, companies, researchers and the community will circulate the projects with the intention of attracting great opportunities for this cohort.

🛠 Projects

We highly recommend that you choose one of the datasets below because we've suggested interesting applications (by skill level) and you'll be able to share baseline models, feature code, etc. with others in the community who will also be working with the same datasets. This is the great perk of not having to optimize for an arbitrary metric while still enabling everyone to create unique applications. You also have the option to work with a custom dataset of your choosing but it must be publicly available.

  • We have prepared a notebook () to get started with each dataset which contains information about available features, downloading it, more information on project ideas, etc.
Moderate Challenging
The Movies Dataset
(numerical, text)
  • Exploratory analysis: relationships between staff/revenue, popularity/genre, etc.
  • Text classification: multi-class classification of movie genres using title, description, etc.
  • Miscellaneous: predict numerical statistics like revenue, popularity, etc. given title, genre, production companies, etc.
COCO
(image, text)
MELD*
(audio, text, image)
  • Speech recognition: identify the speaker (or emotion, sentiment, etc.) based on the audio clip.
  • Text generation: generate a phrase given speaker, emotion, sentiment or visual.
  • Speech synthesis: synthesize speech given text (in the voice of a character) w/ input emotion.
  • Image generation: generate an image given speaker, text and emotion.
* requires pretrained models from other tasks * The MELD dataset is 10GB (due to audio files) and the text is fragmented dialogue so be aware of that for those doing language modeling.

🌍 Teams

For those that want to work with a team, be sure to complete this Google Form. You will be matched based on your chosen dataset, background, experience and interests. If you’re on a team, use each other mostly for learning and guidance. You can all work on one project, or pair up within the team or work on a project by yourself while learning with the team. It’s all completely up to you.

If you are a beginner, take a few weeks to go through online tutorials for the topics you're interested in. And start with a simple version of your project even if your goal is complicated. For example if your final project is text generation, start with doing basic text classification first so you familiarize yourself with the dataset. While the goal of this program is to build an end-to-end ML application, that doesn’t mean you can’t explore different small projects at the beginning. Experiment, learn and try new things and then you can really build out one of your applications end-to-end.

✅ Checklist

We hope the learning experience was rewarding enough but you'll also receive a certificate from MWML and our sponsors for creating an end-to-end machine learning application with the criteria outlined below. It's a great project to showcase on your portfolio whether you're looking for a job or just looking to share with your network.

📚 Resources

General
  • Made With ML: Your one-stop platform to discover, learn and build all things machine learning.
  • Slack group: discussions and resources for the MWML Data Science Incubator.
  • Datasets: collections of open platforms to use to find interesting datasets.
  • Topics: community curated collection of the best resources to learn ML topics.
Lessons
Machine Learning Basics
A practical set of notebooks on machine learning basics, implemented in both TF2.0 + Keras and PyTorch.
deep-learning natural-language-processing tensorflow pytorch

📆 New lessons will be released every week once the incubator starts. We'll be covering both technical and non-technical topics that will be highly relevant to what the teams are working on. The next lessons will be a video lesson on creating Creating an End-to-End Machine Learning Application (June 14th, 2020) with all the criteria from our project completeness checklist.

🏆 Prizes

All of our sponsors were all deliberately chosen because they all offer open-source, core fucntinality for anyone creating an end-to-end ML application. We want the community to learn how to use these tools to really elevate their own work.

On September 26, 2020 (Demo Day), these sponsors will be awarding various prizes for the top N teams that best utilize their platforms and tools. The exact criteria for each sponsor will be announced soon along with the monetary/non-monetary prizes.

  • : best use of Hugging Face's libraries to create a unique project (chosen by the team).
  • : best use of the Weights and Bias suite (dashboard, sweeps and/or artifacts) to create a reproducable, holisitc project that comes with a WandB report.

At the end of the program, everyones projects (tagged and curated) will be organized in this Collection. Many of our sponsors and their affiliated partnerships, etc. will be using this Collection to reach out directly to teams for future internships, research opportunities, etc.

🙏 Volunteer

This entire incubator is a 100% non-profit, community led program. We currently have data scientists who are helping in mentoring teams, experts speaking on certain topics/Q&As, etc. Check out this page if you're interested.

Table of Contents
Sponsors
Share