Exploratory data analysis (EDA) is a vital (and fun) step in the data science process but it’s often misconstrued. Here’s how to think about EDA:
Watch from 0:00 for a video walkthrough of this section.
The code for this section can be found here.
Q1. How many (post filtered) tags do the projects have? We care about this because we want to make sure we don’t overwhelm the user with too many tags (UX constraint).
Q2. What are the most popular tags? We care about this because it’s important to know about the distribution of tags and what tags just made the cut (for performance).
Q3. Is there enough signal in the title and description that’s unique to each tag? This is important because we want to verify our initial hypothesis that the project’s title and description are highly influential features.
All of the work we’ve done so far are inside IPython notebooks but in a later lesson, we’ll transfer all of this into an interactive dashboard using a tool called Streamlit.
Watch from 1:37 for a video walkthrough of this section.