About LEADTOOLS OCR Module

Add OCR technology to your applications quickly and easily.

LEADTOOLS OCR Module includes preset confidence and accuracy levels, artificial intelligence and built-in and user-defined lexicons for limiting the type of text to recognize within a particular zone. You can verify or correct text during and after recognition. LEADTOOLS can help with your OCR development and cut your programming time in half. With the LEADTOOLS OCR Module a developer can add many zoned areas to the same page and each zone can have its own options such as the OCR engine, filter and more.

Overview

With the LEADTOOLS OCR module a developer can add many zoned areas to the same page and each zone can have its own options such as the OCR engine, filter and much more. LEAD makes OCR development easier by presetting OCR properties that work for most images by default. And to improve recognition results, LEADTOOLS OCR supports custom dictionaries to recognize words that may only exist in the documents being recognized. Other OCR module features include support for over 100 languages, output document options like document margins, paragraph options and more, and new output formats.

Three specialized OCR recognition engines are supported:

MOR OCR Engine

  • Supports 114 languages
  • Supports up to 500 zones on one image
  • Supports Omnifont, Draftdot24 and OCR-A filling methods
  • Supports character training to achieve improved accuracy
  • Provides 3 page-level accuracy and speed trade off settings including Accurate, Balanced and Fast
  • Provides Checking Subsystem based correction

MTX (Mtext) OCR Engine

  • The fastest of the selectable OCR engines
  • Support for 12 languages
  • Supports up to 64 zones on one image
  • Supports Omnifont, Draftdot9 and Draftdot24 filling methods
  • Provides 2 page-level accuracy and speed trade off settings including a combined Accurate & Balanced value and Fast
  • Provides Checking Subsystem based correction

FireWorX OCR Engine

  • Optimized for speed
  • Support for 54 languages
  • Supports up to 2,500 zones on one image
  • Supports Omnifont filling methods
  • Supports character training to achieve improved accuracy

Output formats supported:

  • Adobe PDF edited
  • Open eBook 1.0
  • XML
  • 2G Type 2
  • 2G Type 3
  • Over 40 output formats supported

Additional Features:

  • Recognize and export text, choosing from a variety of text, word processing, database, or spreadsheet file formats
  • Select the language of documents to be recognized. Choose from English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, or Swedish
  • Documents with multiple languages can be recognized
  • Set accuracy thresholds prior to recognition to control the accuracy of recognition
  • Learn, save and load character recognition data for similar documents. The software learns as a result of normal recognition, and acquires additional information by using the OCR's text verification system
  • Recognize text from 5 to 72 points in virtually any typeface
  • Increase recognition accuracy with built-in lexical classes and user defined lexicons
  • Verify or correct text during the recognition process based on confidence levels set prior to recognition. If a word or character falls within the set range, a dialog box can be brought up to allow the user to see the original image and the preliminary results of the recognition. From the dialog box, the user may make any necessary corrections to the recognized text
  • Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly
  • Recognize all document pages at once while managing multiple pages of a document when pages are in different stages of processing (segmentation vs. recognition)
  • Recognize a document page(s) and export text in raw format to over 40 different formats including MS Word, PDF*, MS Excel, Dbase, WordPerfect and ScanSoft's defined mark-up language format (XDoc/XDoc Lite)
  • Automatically segment the page to correctly recognize text on pages with complex or irregular layouts, including tables, reverse video, and line art as well
  • Process both text and graphics. The recognition software's ability to distinguish graphics from text provides the basis of creating a compound document processing system
  • Process documents in two-page mode for open-faced books and magazines
  • Process images from newspapers with special image processing designed for newsprint
  • And more...

LEADTOOLS provides an interface to verify or correct text during the recognition. LEAD's OCR verification dialog ties the text being edited directly to the image, providing a visual reference to the original bitmap data. The OCR engine can perform Automatic area segmentation creating multi-layered zones, recognizing areas such as tables, rules, images and text. Or, you can manually designate up to 2500 such zones.

Fax, dot matrix and halftones can be preprocessed to improve recognition results. The OCR Engine supports major European and Scandinavian languages (Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, and Swedish) as well as English. Recognized text can be exported to more than 40 different formats, including MS Word, PDF*, MS Excel, Dbase, WordPerfect and ScanSoft's defined mark-up language format (XDoc/XDoc Lite). You get superior OCR processing speeds, for use in form recognition and processing applications.