Below are curated collections of datasets where you can search for specifics.

Overview

The Process for Data Preparation and Feature Engineering
To get our predictions right, we must construct the data set and transform the data correctly.
data-collection feature-engineering systems-design tutorial
How (And Why) to Create a Good Validation Set
Steps for creating a representative validation set for training.
data-collection validation-set checklist systems-design

Libraries

General
Kaggle Datasets
Find and use datasets or complete tasks.
datasets kaggle library
Discovering Millions of Datasets on the Web
Dataset Search has indexed almost 25 million of these datasets, giving you a single place to search for datasets and find links to where the data is.
datasets dataset-search search machine-learning
UCI ML Datasets
We currently maintain 507 data sets as a service to the machine learning community.
datasets library
Fast.ai Datasets
Collections of original and reduced datasets for popular data sources.
datasets natural-language-processing computer-vision imagenet
Imbalanced Learn
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning.
class-imbalance imbalanced-datasets library code
NLP Libraries
The Big Bad NLP Database
A collection of 400+ NLP datasets with papers included.
datasets natural-language-processing library
NLP Viewer 🤗
A simple website for browsing popular NLP datasets.
natural-language-processing huggingface datasets streamlit
Gutenberg Dialog
Build a dialog dataset from online books in many languages.
dataset language-modeling natural-language-processing datasets
CV Libraries
Fast.ai Datasets
Collections of original and reduced datasets for popular data sources.
datasets natural-language-processing computer-vision imagenet
Imagenette
Imagenette is a subset of 10 easily classified classes from Imagenet.
dataset imagenet computer-vision imagenette
Other Libraries
Recommendation Systems Datasets
This tool allows you download, unpack and read recommender systems datasets into pandas.DataFrame as easy as data = Dataset().
datasets recommendation-systems recommender-systems research-tool
Table of Contents
Share a project
Share something you or the community has made with ML.
Topic experts
Share