About Atalasoft DotImage OCR Module Add-On

Add OCR capabilities to your DotImage applications.

Atalasoft DotImage OCR is an optical character recognition module for Microsoft .NET developers giving programmers the capability to add character recognition to their applications.  Atalasoft's approach to OCR is to provide an object oriented generic interface that can support any OCR engine.  This enables users of DotImage OCR to change OCR engines with a single line of code.  It's also convenient for testing and evaluating various OCR engines.  Atalasoft currently provides three OCR engine interfaces, GlyphreaderEngine, TesseractEngine and RecostartEngine.

OCR/ICR Transformation
OCR, or Optical Character Recognition, is a process to locate and identify typed letters in an image. ICR or Intelligent Character Recognition is a process similar to OCR but it is used to identify handwritten letters in an image. Atalasoft's toolkit allows OCR and ICR engines to be implemented by extending the base OcrEngine class. The Recognize() method is used to start the process. Additionally, Atalasoft have partnerships with the following OCR and ICR engines and provide the OcrEngine class overload out of the box:

GlyphreaderEngine - Transym - A closed source OCR engine that vectorizes glyphs then determines all the possible letters that it could be.

  • Supports the European Character Set
  • Reports individual character position and size
  • Reports character confidence
  • Properly OCR's rotated pages, reporting the rotation angle
  • Has Auto-Rotate functionality, rotating documents to the correct orientation 
  • Can automatically break merged characters, or merge broken characters
  • Can optionally reject low confidence characters
  • Can optionally reject low confidence lines
  • Can disable recognition of specific characters
  • Full Page color OCR can be generated when combined with the Searchable PDF Module

TesseractEngine - Google - An intelligent learning open-source OCR engine with many extended language options. 

  • Integrated support for the languages Dutch, English, French, German, Italian, Portuguese, and Spanish
  • Atalasoft tests additional language add-on packs for: Chinese(Simplified), Chinese(Traditional), Danish, Finnish, Greek, Hebrew, Japanese, Korean, Norwegian, Russian, Swedish, Turkish
  • Tesseract provides additional language add-on packs here: http://code.google.com/p/tesseract-ocr/downloads/list
  • Ability to determine character, word, and line size and location
  • Reports confidence of each recognized character
  • Output to Text or Searchable PDF
  • Royalty Free Desktop Licensing

RecostarEngine - Open Text - A fast closed source engine that provides both OCR and ICR

  • Supports the European Character Set
  • Reports individual character position and size
  • Reports character confidence
  • Properly OCR's rotated pages, reporting the rotation angle
  • Can automatically break merged characters, or merge broken characters
  • Can optionally reject low confidence characters
  • Can optionally reject low confidence lines
  • Can disable recognition of specific characters
  • Full Page color OCR can be generated when combined with the Searchable PDF Module

Additionally, Atalasoft's engines can be integrated to use the PdfTranslator. This module automatically translates an image into a searchable PDF file. Simply call Translate() in any of Atalaosft's OcrEngines.