Published: 09 Oct 2015 Category: deep_learning


Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

End-to-End Text Recognition with Convolutional Neural Networks

Word Spotting and Recognition with Embedded Attributes

Reading Text in the Wild with Convolutional Neural Networks

Deep structured output learning for unconstrained text recognition

  • intro: “propose an architecture consisting of a character sequence CNN and an N-gram encoding CNN which act on an input image in parallel and whose outputs are utilized along with a CRF model to recognize the text content present within the image.”
  • arxiv: http://arxiv.org/abs/1412.5903

Deep Features for Text Spotting

Reading Scene Text in Deep Convolutional Sequences

DeepFont: Identify Your Font from An Image

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

Recursive Recurrent Nets with Attention Modeling for OCR in the Wild

Writer-independent Feature Learning for Offline Signature Verification using Deep Convolutional Neural Networks

DeepText: A Unified Framework for Text Proposal Generation and Text Detection in Natural Images

End-to-End Interpretation of the French Street Name Signs Dataset

End-to-End Subtitle Detection and Recognition for Videos in East Asian Languages via CNN Ensemble with Near-Human-Level Performance

Smart Library: Identifying Books in a Library using Richly Supervised Deep Scene Text Reading

Improving Text Proposals for Scene Images with Fully Convolutional Networks

  • intro: Universitat Autonoma de Barcelona (UAB) & University of Florence
  • intro: International Conference on Pattern Recognition (ICPR) - DLPR (Deep Learning for Pattern Recognition) workshop
  • arxiv: https://arxiv.org/abs/1702.05089

Scene Text Eraser


Attention-based Extraction of Structured Information from Street View Imagery

Implicit Language Model in LSTM for OCR


Text Detection

Object Proposals for Text Extraction in the Wild

Text-Attentional Convolutional Neural Networks for Scene Text Detection

Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network

Synthetic Data for Text Localisation in Natural Images

Scene Text Detection via Holistic, Multi-Channel Prediction

Detecting Text in Natural Image with Connectionist Text Proposal Network

TextBoxes: A Fast Text Detector with a Single Deep Neural Network

TextBoxes++: A Single-Shot Oriented Scene Text Detector

Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection

Detecting Oriented Text in Natural Images by Linking Segments

Deep Direct Regression for Multi-Oriented Scene Text Detection

Cascaded Segmentation-Detection Networks for Word-Level Text Spotting



WordFence: Text Detection in Natural Images with Border Awareness

SSD-text detection: Text Detector

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

R-PHOC: Segmentation-Free Word Spotting using CNN

Towards End-to-end Text Spotting with Convolutional Recurrent Neural Networks

EAST: An Efficient and Accurate Scene Text Detector

Deep Scene Text Detection with Connected Component Proposals

Single Shot Text Detector with Regional Attention

Fused Text Segmentation Networks for Multi-oriented Scene Text Detection


Deep Residual Text Detection Network for Scene Text

  • intro: IAPR International Conference on Document Analysis and Recognition (ICDAR) 2017. Samsung R&D Institute of China, Beijing
  • arxiv: https://arxiv.org/abs/1711.04147

Feature Enhancement Network: A Refined Scene Text Detector

ArbiText: Arbitrary-Oriented Text Detection in Unconstrained Scene


Detecting Curve Text in the Wild: New Dataset and New Solution

FOTS: Fast Oriented Text Spotting with a Unified Network


PixelLink: Detecting Scene Text via Instance Segmentation

PixelLink: Detecting Scene Text via Instance Segmentation

Sliding Line Point Regression for Shape Robust Scene Text Detection


Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

Single Shot TextSpotter with Explicit Alignment and Attention

Rotation-Sensitive Regression for Oriented Scene Text Detection

Detecting Multi-Oriented Text with Corner-based Region Proposals

An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches


IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

Boosting up Scene Text Detectors with Guided CNN


Shape Robust Text Detection with Progressive Scale Expansion Network

A Single Shot Text Detector with Scale-adaptive Anchors


TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping

TextContourNet: a Flexible and Effective Framework for Improving Scene Text Detection Architecture with a Multi-task Cascade


Correlation Propagation Networks for Scene Text Detection


Scene Text Detection with Supervised Pyramid Context Network

Improving Rotated Text Detection with Rotation Region Proposal Networks


Pixel-Anchor: A Fast Oriented Scene Text Detector with Combined Networks


Mask R-CNN with Pyramid Attention Network for Scene Text Detection

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection

Detecting Text in the Wild with Deep Character Embedding Network

MSR: Multi-Scale Shape Regression for Scene Text Detection


Pyramid Mask Text Detector

Shape Robust Text Detection with Progressive Scale Expansion Network

Tightness-aware Evaluation Protocol for Scene Text Detection

Character Region Awareness for Text Detection

Text Recognition

Sequence to sequence learning for unconstrained scene text recognition

Drawing and Recognizing Chinese Characters with Recurrent Neural Network

Learning Spatial-Semantic Context with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition

Stroke Sequence-Dependent Deep Convolutional Neural Network for Online Handwritten Chinese Character Recognition

Visual attention models for scene text recognition


Focusing Attention: Towards Accurate Text Recognition in Natural Images

Scene Text Recognition with Sliding Convolutional Character Models


AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition


A New Hybrid-parameter Recurrent Neural Networks for Online Handwritten Chinese Character Recognition


AON: Towards Arbitrarily-Oriented Text Recognition

Arbitrarily-Oriented Text Recognition

SEE: Towards Semi-Supervised End-to-End Scene Text Recognition


Edit Probability for Scene Text Recognition

SCAN: Sliding Convolutional Attention Network for Scene Text Recognition


Adaptive Adversarial Attack on Scene Text Recognition

ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification


A Multi-Object Rectified Attention Network for Scene Text Recognition

SAFE: Scale Aware Feature Encoder for Scene Text Recognition

A Simple and Robust Convolutional-Attention Network for Irregular Text Recognition


FACLSTM: ConvLSTM with Focused Attention for Scene Text Recognition


Text Detection + Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition

Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework

FOTS: Fast Oriented Text Spotting with a Unified Network


Single Shot TextSpotter with Explicit Alignment and Attention

An end-to-end TextSpotter with Explicit Alignment and Attention

Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes

Scene Text Detection and Recognition: The Deep Learning Era

A Novel Integrated Framework for Learning both Text Detection and Recognition

Efficient Video Scene Text Spotting: Unifying Detection, Tracking, and Recognition

Breaking Captcha

Using deep learning to break a Captcha system

Breaking reddit captcha with 96% accuracy

I’m not a human: Breaking the Google reCAPTCHA

Neural Net CAPTCHA Cracker

Recurrent neural networks for decoding CAPTCHAS

Reading irctc captchas with 95% accuracy using deep learning


I Am Robot: (Deep) Learning to Break Semantic Image CAPTCHAs


Handwritten Recognition

High Performance Offline Handwritten Chinese Character Recognition Using GoogLeNet and Directional Feature Maps

Recognize your handwritten numbers


Handwritten Digit Recognition using Convolutional Neural Networks in Python with Keras

MNIST Handwritten Digit Classifier


LeNet – Convolutional Neural Network in Python

Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention

MLPaint: the Real-Time Handwritten Digit Recognizer

Training a Computer to Recognize Your Handwriting


Using TensorFlow to create your own handwriting recognition engine

Building a Deep Handwritten Digits Classifier using Microsoft Cognitive Toolkit

Hand Writing Recognition Using Convolutional Neural Networks

Design of a Very Compact CNN Classifier for Online Handwritten Chinese Character Recognition Using DropWeight and Global Pooling

Handwritten digit string recognition by combination of residual network and RNN-CTC


Plate Recognition

Reading Car License Plates Using Deep Convolutional Neural Networks and LSTMs

Number plate recognition with Tensorflow


Segmentation-free Vehicle License Plate Recognition using ConvNet-RNN

  • intro: International Workshop on Advanced Image Technology, January, 8-10, 2017. Penang, Malaysia. Proceeding IWAIT2017
  • arxiv: https://arxiv.org/abs/1701.06439

License Plate Detection and Recognition Using Deeply Learned Convolutional Neural Networks

Adversarial Generation of Training Examples for Vehicle License Plate Recognition


Towards End-to-End Car License Plates Detection and Recognition with Deep Neural Networks

Towards End-to-End License Plate Detection and Recognition: A Large Dataset and Baseline

High Accuracy Chinese Plate Recognition Framework

LPRNet: License Plate Recognition via Deep Neural Networks

  • intrp=o: Intel IOTG Computer Vision Group
  • intro: works in real-time with recognition accuracy up to 95% for Chinese license plates: 3 ms/plate on nVIDIAR GeForceTMGTX 1080 and 1.3 ms/plate on IntelR CoreTMi7-6700K CPU.
  • arxiv: https://arxiv.org/abs/1806.10447

How many labeled license plates are needed?


Applying OCR Technology for Receipt Recognition

Hacking MNIST in 30 lines of Python

Optical Character Recognition Using One-Shot Learning, RNN, and TensorFlow


Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning



ocropy: Python-based tools for document analysis and OCR

Extracting text from an image using Ocropus

CLSTM : A small C++ implementation of LSTM networks, focused on OCR

OCR text recognition using tensorflow with attention

Digit Recognition via CNN: digital meter numbers detection

Attention-OCR: Visual Attention based OCR

umaru: An OCR-system based on torch using the technique of LSTM/GRU-RNN, CTC and referred to the works of rnnlib and clstm

Tesseract.js: Pure Javascript OCR for 62 Languages

DeepHCCR: Offline Handwritten Chinese Character Recognition based on GoogLeNet and AlexNet (With CaffeModel)

deep ocr: make a better chinese character recognition OCR than tesseract


Practical Deep OCR for scene text using CTPN + CRNN


Tensorflow-based CNN+LSTM trained with CTC-loss for OCR






Deep Learning for OCR


Scene Text Localization & Recognition Resources

Scene Text Localization & Recognition Resources

awesome-ocr: A curated list of promising OCR resources