Search results

Why Batch Norm Causes Exploding Gradients
Our beloved Batch Norm can actually cause exploding gradients, at least at initialization time.
batch-normalization exploding-gradients weights-initialization deep-learning
Rethinking Batch Normalization in Transformers
We found that NLP batch statistics exhibit large variance throughout training, which leads to poor BN performance.
power-normalization batch-normalization transformers natural-language-processing
projects 1 - 2 of 2
Share your project
Share what you've made with ML. Post Project
Learn practical AI
Learn ML with clean code, simplified math and visuals.
Lessons MWML lessons are among the top 10 ML repositories on GitHub.
Join the community
Get feedback on your projects, interview prep, and more! Chat on Slack