Objectives & Highlights

Scene text recognition with arbitrary shape is very challenging due to large variations in text shapes, fonts, colors, backgrounds, etc. Most state-of-the-art algorithms rectify the input image into the normalized image, then treat the recognition as a sequence prediction task. The bottleneck of such methods is the rectification, which will cause errors due to distortion perspective. In this paper, we find that the rectification is completely unnecessary. What all we need is the spatial attention.

Don't forget to add the tag @fengxinjie in your comments.

This project's author does not have a MWML account yet. If you are @fengxinjie, then sign up to gain ownership of this project and edit this page.
Share this project
Similar projects
Graph Convolution on Structured Documents
Convert structured documents to graphs for document entity classification.