PDFlib TET updated

March 12, 2012 - 10:23
Patch Release
PDFlib TET updated
Text extracted from a PDF with PDFlib TET.

PDFlib TET (Text Extraction Toolkit) is software for reliably extracting text information from any PDF file. It is available as a component or as a command-line tool. It makes available text contents of PDFs as Unicode strings or structured XML, plus detailed glyph and font information. With PDFlib TET you can retrieve the corresponding Unicode values for text in a PDF document, as well as its position on the page.

Updates in V4.1

  • Reduced memory requirements for very large documents
  • Performance improvements
  • Support for PDF documents encrypted with Acrobat X
  • Bug fixes and robustness improvements
  • Improved heuristics for processing malformed PDF input
  • Additional PDF details available in TETML output
  • PCOS interface 8 with new pseudo objects, e.g. for detecting transparency
  • Improved handling of encrypted file attachments

About PDFlib

Munich-based PDFlib GmbH, founded in 2000, develops and sells leading edge components for server-centric generation and processing of PDF documents. PDFlib customers use the software for automated and high volume generation and processing of PDF documents in business and prepress workflows or for online billing systems. PDFlib GmbH sells worldwide with main markets in North America, Germany and Japan.