Aspose.OCR for .NET 关于

将 OCR/OMR 功能添加至您的.NET 应用程序。

Aspose.OCR for .NET is a robust optical character recognition API for adding OCR functionality to applications. The API is extensible, easy to use, compact and provides a simple set of classes for controlling character recognition. It supports commonly used image formats and provides functionalities like reading characters and fonts from images, bold and italic styles, noise removal filters, scanning of the whole image or any part of the image and much more.

Supported File Formats

Images

  • JPEG
  • PNG
  • TIFF
  • BMP
  • GIF
  • Scanned PDF

Batch OCR

  • Multi-page PDF
  • DjVu
  • ZIP
  • Folder

Recognition results

  • Text
  • PDF
  • Microsoft Word
  • Microsoft Excel
  • HTML
  • RTF
  • ePub
  • JSON
  • XML
  • CSV

Advanced .NET OCR API Features

  • Photo OCR - Extract text from smartphone photos with scan-level accuracy.
  • Searchable PDF - Convert any scan into a fully searchable and indexable document.
  • URL recognition - Recognize an image from URL without downloading it locally.
  • Bulk recognition - Read all images from multi-page documents, folders and archives.
  • Any font and style - Identify and recognize text in all popular typefaces and styles.
  • Fine-tune recognition - Adjust every OCR parameter for best recognition results.
  • Spell checker - Improve results by automatically correcting misspelled words.
  • Mathematical formula detection - Accurately detect and recognize complex mathematical expressions.
  • Text search and image-to-image comparison - Search for text or patterns in images, or compare the recognized text from two images.
  • AI-powered correction - Fix misrecognized words and grammar using transformer-based LLMs - no custom training required.
  • Semantic postprocessing - Go beyond characters: refine noisy OCR output with LLMs for improved content quality and language normalization.
  • Plug-in LLM pipelines - Connect external language models to correct OCR recognition mistakes and restore incomplete or fragmented text.
  • 140+ recognition languages - Aspose's .NET OCR library is a universal solution for document processing, data extraction, and content digitization on a global scale. You can recognize documents written in mixed languages, such as Chinese/English, Arabic/French or Cyrillic/English. The following languages are supported:
    • Extended Latin: English, Spanish, French, Indonesian, Portuguese, German, Vietnamese, Turkish, Italian, Polish, and 80+ more.
    • Cyrillic alphabet: Russian, Ukrainian, Kazakh, Bulgarian, including mixed Cyrillic/English texts.
    • Arabic, Persian, Urdu, including texts mixed with English.
    • Chinese, Korean, Japanese, Devanagari, and Dravidian languages, including Hindi, Tamil, Marathi, and others.