Screenshot Preview

此頁只有英文版

PDFlib TET adds performance enhancements

Version 4.0 supports Unicode post-processing plus right-to-left and bidirectional text extraction.

Text extracted from a PDF with PDFlib TET.

Text extracted from a PDF with PDFlib TET.

PDFlib TET (Text Extraction Toolkit) is software for reliably extracting text information from any PDF file. It is available as a library/component and as a command-line tool. PDFlib TET makes available the text contents of a PDF as Unicode strings or structured XML, plus detailed glyph and font information. With PDFlib TET you can retrieve the corresponding Unicode values for text in a PDF document, as well as its position on the page.

The following products are available:

Updates in V4.0

New features in PDFlib TET 4.0:

  • Performance enhancements: faster for many classes of documents
  • Higher speed and smaller memory consumption for very large documents up to hundreds of thousands of pages
  • Extract right-to-left and bidirectional text for Arabic, Hebrew, etc.
  • Unicode post-processing:
  • Foldings preserve, remove or replace characters
  • Decompositions replace a character with an equivalent sequence, e.g. replace narrow or vertical Japanese characters with their standard counterparts.
  • Text can be converted to all four Unicode normalization forms, e.g. emit NFC form to meet the requirements for Web text or a database.
  • Improved shadow removal, word boundary detection, and dehyphenation
  • Improved super and subscript detection
  • Workarounds for non-conforming PDF documents to enhance robustness
  • Enhanced repair mode for successfully extracting text from damaged PDF
  • More information in TET's XML output (TETML), e.g. dehyphenation, dropcap, shadow, and super/subscript
  • Improved C++ and Perl language bindings

New features in PDFlib TET PDF IFilter 4.0:

  • Takes advantage of the improved TET 4.0 kernel
  • Automatic language detection for improved search results (find word stems, partial matches, etc.)
  • Support for SharePoint 2010

About PDFlib

Munich-based PDFlib GmbH, founded in 2000, develops and sells leading edge components for server-centric generation and processing of PDF documents. PDFlib customers use the software for automated and high volume generation and processing of PDF documents in business and prepress workflows or for online billing systems. The components produced by PDFlib GmbH are readily available for all common environments (operating systems and programming languages) as commercial versions and partially as open source. PDFlib GmbH sells worldwide with main markets in North America, Germany and Japan.

‎相關新聞

產品: PDFlib TET | PDFlib TET PDF IFilter

發佈者: PDFlib

類別: PDF

架構: 32 Bit | 64 Bit | ActiveX Components | ActiveX DLL | Components | DLL | FreeBSD | JavaBean | Java Components | Linux Dev Tools | Kernel | .NET Class | .NET Components | Dev Tools & IT Utilities | Unix Dev Tools | Windows Dev Tools | Windows 2000 | Windows 7 | Windows Server 2003 | Windows Server 2008 | Windows Vista | Windows XP

‎平臺: C++Builder | Embarcadero / CodeGear | Delphi | Eclipse | JBuilder | Microsoft | Visual Basic | Visual Basic 2005 | Visual Basic .NET | Visual C++ | Visual C++ 2005 | Visual C++ .NET | Visual C# 2005 | Visual C# .NET | Visual Studio | Visual Studio 2005 | Visual Studio .NET

類型: Feature Releases

書籤與

Delicious  Digg  Facebook  Reddit  Stumble Upon  Twitter

產品搜索

輸入搜索詞:

為什麼從ComponentSource購買?

ComponentSource 提供獨特的全球國際服務, 在世界各地共有超過1,000,000 開發者客戶.

更多訊息 | 關於我們