Tips for Successfully Training Transformers on Small Datasets
It turns out that you can easily train transformers on small datasets when you use tricks (and have the patience to train a very long time).
