latest | popular

Filter by
Rethinking Batch Normalization in Transformers
We found that NLP batch statistics exhibit large variance throughout training, which leads to poor BN performance.
normalization batch-normalization power-normalization transformers
projects 1 - 1 of 1
Topic experts
Share a project
Share something you or the community has made with ML.