Below are curated collections of datasets where you can search for specifics.

Overview

The Process for Data Preparation and Feature Engineering
To get our predictions right, we must construct the data set and transform the data correctly.
data-collection feature-engineering systems-design tutorial
How (And Why) to Create a Good Validation Set
Steps for creating a representative validation set for training.
data-collection validation-set checklist systems-design
Git for Data: Not a Silver Bullet
What we mean when we talk about version-control for data.
data dvc versioning git

Libraries

General
Kaggle Datasets
Find and use datasets or complete tasks.
datasets kaggle library
Discovering Millions of Datasets on the Web
Dataset Search has indexed almost 25 million of these datasets, giving you a single place to search for datasets and find links to where the data is.
datasets dataset-search search machine-learning
UCI ML Datasets
We currently maintain 507 data sets as a service to the machine learning community.
datasets library
Image Dataset Tool (IDT)
Image Dataset Tool (idt) is a cli tool designed to make the otherwise repetitive and slow task of creating image datasets into a fast and intuitive ...
datasets cli dataset-creation code
Bingoset - CLI tool to create image dataset.
CLI Toolkit to quickly create an image dataset using Bing Image Search API.
datasets cli dataset-creation image-classification
Fast.ai Datasets
Collections of original and reduced datasets for popular data sources.
datasets natural-language-processing computer-vision imagenet
Imbalanced Learn
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning.
class-imbalance imbalanced-datasets library code
NLP Libraries
The Big Bad NLP Database
A collection of 400+ NLP datasets with papers included.
datasets natural-language-processing library
NLP Viewer 🤗
A simple website for browsing popular NLP datasets.
natural-language-processing huggingface datasets streamlit
Gutenberg Dialog
Build a dialog dataset from online books in many languages.
dataset language-modeling natural-language-processing datasets
CV Libraries
Fast.ai Datasets
Collections of original and reduced datasets for popular data sources.
datasets natural-language-processing computer-vision imagenet
Imagenette
Imagenette is a subset of 10 easily classified classes from Imagenet.
dataset imagenet computer-vision imagenette
Search for visual datasets
By task, application, class, label or format.
computer-vision datasets library
Other Libraries
Recommendation Systems Datasets
This tool allows you download, unpack and read recommender systems datasets into pandas.DataFrame as easy as data = Dataset().
datasets recommendation-systems recommender-systems research-tool
Table of Contents
Share a project
Share something you or the community has made with ML.
Topic experts
Share