Version 4.0 supports Unicode post-processing plus right-to-left and bidirectional text extraction.
PDFlib TET (Text Extraction Toolkit) is software for reliably extracting text information from any PDF file. It is available as a library/component and as a command-line tool. PDFlib TET makes available the text contents of a PDF as Unicode strings or structured XML, plus detailed glyph and font information. With PDFlib TET you can retrieve the corresponding Unicode values for text in a PDF document, as well as its position on the page.
The following products are available:
New features in PDFlib TET 4.0:
New features in PDFlib TET PDF IFilter 4.0:
Munich-based PDFlib GmbH, founded in 2000, develops and sells leading edge components for server-centric generation and processing of PDF documents. PDFlib customers use the software for automated and high volume generation and processing of PDF documents in business and prepress workflows or for online billing systems. The components produced by PDFlib GmbH are readily available for all common environments (operating systems and programming languages) as commercial versions and partially as open source. PDFlib GmbH sells worldwide with main markets in North America, Germany and Japan.
Published in Development Tool News & Software Component News, August 19, 2010