When developing an application, there are a lot of technical decisions and results (preprocessing, performance, etc.) that are integral to our system. How can we effectively communicate this to other developers and business stakeholders? One option is a Jupyter notebook but it's often cluttered with code and isn't very easy for non-technical team members to access and run. We need to create a dashboard that can be accessed without any technical prerequisites and effectively communicates key findings. It would be even more useful if our dashboard was interactive such that it provides utility even for the technical developers.
With Streamlit, we can quickly create an empty application and as we develop, the UI will update as well.
# Setup pip install streamlit mkdir streamlit touch streamlit/st_app.py streamlit run streamlit/st_app.py
Local URL: http://localhost:8501
Before we create a dashboard for our specific application, we need to learn about the different Streamlit API components. Instead of going through them all in this lesson, take ten minutes and go through the entire documentation page. It's quite short and we promise you'll be amazed at how many UI components (styled text, latex, tables, plots, etc.) you can create using just Python. We'll explore the different components in detail as they apply to creating different interactions for our specific dashboard below.
We start by showing a sample of our different data sources because, for many people, this may be the first time they see the data so it's a good opportunity for them to understand all the different features, formats, etc. For displaying the tags, we don't want to just dump all of them on the dashboard but instead we can use a selectbox to allow the user to view them one at a time.
1 2 3 4 5 6 7 8 9 10 11 12 13
We can also show our a snapshot of the loaded DataFrame which has sortable columns that people can play with to explore the data.
1 2 3 4
We can essentially walk viewers through our entire data phase (EDA, preprocessing, etc.) and allow them (and ourselves) to explore key decisions. For example, we chose to introduce a minimum tag frequency constraint so that we can have enough samples. We can now interactively change that value with a slider widget and see which tags just made and missed the cut.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
What makes this truly interactive is that when we alter the value here, all the downstream tables and plots will update to reflect that change immediately. This is a great way to explore what constraints to use because we can quickly visualize the impact it can have on our data.
1 2 3 4 5 6 7 8 9 10 11
On a similar note, we can also interactively view how our preprocessing functions behave. We can alter any of the function's default input arguments, as well as the input text.
1 2 3 4 5 6 7
In fact, we were able to discover and fix a bug here where the NLTK package automatically lowers text when stemming which we had to override using our
Stemmer class in our data script.
This page allows us to quickly compare the improvements and regressions of our local system and what's currently in production. We want to provide the key differences in both the performance and parameters used for each system version. We could also use constructs, such as Git tags, to visualize these details across multiple previous releases.
With the inference page, we want to be able to test our model using various inputs to receive predictions, as well as intermediate outputs (ex. preprocessed text). This is a great way for our team members to quickly play with the latest deployed model.
Our last page will enable a closer inspection on the test split's predictions to identify areas to improve, collect more data, etc. First we offer a quick view of each tag's performance and we could also do the same for specific slices of the data we may care about (high priority, minority, etc.)
We're also going to inspect the true positive (TP), false positive (FP) and false negative (FN) samples across our different tags. It's a great way to catch issues with labeling (FP), weaknesses (FN), etc.
Be careful not to make decisions based on predicted probabilities before calibrating them to reliably use as measures of confidence.
- Connect inspection pipelines with annotation systems so that changes to the data can be reviewed and incorporated.
- Use false positives to identify potentially mislabeled data or estimate training data influences (TracIn) on their predictions.
- Inspect the trained model's behavior under various conditions using the WhatIf tool.
- Compare performances across multiple releases to visualize improvements/regressions over time.
There are a few functions defined at the start of our st_app.py script which have a
@st.cache decorator. This calls for Streamlit to cache the function by the combination of it's inputs which will significantly improve performance involving computationally heavy functions.
1 2 3 4 5 6 7 8 9 10 11 12 13
We have several different options for deploying and managing our Streamlit dashboard. We could use Streamlit's sharing feature (beta) which allows us to seamlessly deploy dashboards straight from GitHub. Our dashboard will continue to stay updated as we commit changes to our repository. Another option is to deploy the Streamlit dashboard along with our API service. We can use docker-compose to spin up a separate container or simply add it to the API service's Dockerfile's ENTRYPOINT with the appropriate ports exposed. The later might be ideal, especially if your dashboard isn't meant to be public and it you want added security, performance, etc.
To cite this lesson, please use:
1 2 3 4 5 6