We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples. By establishing a correlation between sample quality and image classification accuracy, we show that our best generative model also contains features competitive with top convolutional nets in the unsupervised setting.

Don't forget to tag @openai in your comment, otherwise they may not be notified.

Authors community post
Share this project
Similar projects
Jukebox: A Generative Model for Music
We’re introducing Jukebox, a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles.
Tempering Expectations for GPT-3 and OpenAI’s API
A closer look at the "magic" behind GPT-3 and caveats to be aware of.
Insight
Project Insight is designed to create NLP as a service with code base for both front end GUI (streamlit) and backend server (FastAPI) the usage of ...
Paraphrase Any Question with T5 (Text-To-Text Transformer)
Given a question, generate paraphrased versions of the question with T5 transformer. Pretrained model and training script provided.