Search results

A Survey of Long-Term Context in Transformers
Over the past two years the NLP community has developed a veritable zoo of methods to combat expensive multi-head self-attention.
transformers multi-head-attention attention natural-language-processing
Talking-Heads Attention
A variation on multi-head attention which includes linear projections across the attention-heads dimension, immediately before and after the softmax ...
multi-head-attention talking-heads-attention attention transformers
projects 1 - 2 of 2
Share your project
Share what you've made with ML. Post Project
Learn practical AI
Learn ML with clean code, simplified math and visuals.
Lessons MWML lessons are among the top 10 ML repositories on GitHub.
Join the community
Get feedback on your projects, interview prep, and more! Chat on Slack