COVID-Q: A Dataset of 1,690 Questions about COVID-19
This dataset consists of COVID-19 questions which have been annotated into a broad category (e.g. Transmission, Prevention) and a more specific class such ...
covid-19 question-answering dataset covid-q
The Abstraction and Reasoning Corpus (ARC)
Can a computer learn complex, abstract tasks from just a few examples? ARC can be used to measure a human-like form of general fluid intelligence.
artificial-general-intelligence common-sense-reasoning arc dataset
Dakshina Dataset
A collection of text in both Latin and native scripts for 12 South Asian languages.
dataset natural-language-processing languages dakshina
Gutenberg Dialog
Build a dialog dataset from online books in many languages.
dataset language-modeling natural-language-processing datasets
Twitter Turing Test
Can you guess whether this tweet is written by a human or generated by a neural network?
natural-language-processing text-generation gpt2 twitter
Library to scrape and clean web pages to create massive datasets.
dataset natural-language-processing data-collection text-mining
