We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples. By establishing a correlation between sample quality and image classification accuracy, we show that our best generative model also contains features competitive with top convolutional nets in the unsupervised setting.

Don't forget to tag @openai in your comment, otherwise they may not be notified.

Authors community post
Share this project
Similar projects
Tempering Expectations for GPT-3 and OpenAI’s API
A closer look at the "magic" behind GPT-3 and caveats to be aware of.
Jukebox: A Generative Model for Music
We’re introducing Jukebox, a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles.
NLP Model Selection
NLP model selection guide to make it easier to select models. This is prescriptive in nature and has to be used with caution.
Haystack — Neural Question Answering At Scale
🔍 Transformers at scale for question answering & search