Screenshot Preview

PDFlib TET

by PDFlib - Product Type: Component / Application / .NET Class / ActiveX DLL / DLL / JavaBean

Text extraction toolkit. PDFlib TET (Text Extraction Toolkit) reliably extracts text, images and metadata from any PDF file. It is available as a library/component and as a command-line tool. PDFlib TET makes available the text contents of a PDF as Unicode strings or structured XML, plus detailed glyph and font information. With PDFlib TET you can retrieve the corresponding Unicode values for text in a PDF document, as well as its position on the page.

From
$ 401.94

Our regular prices are shown below. Please logon to see your discounted prices.

Showing: All Prices | Renewals Only  in

PDFlib TET 4.3 Windows Desktop Systems

Add to Cart $ 401.94 1 User License for Windows XP/Vista/7/8 on x86/x64 Delivered via Download
Add to Cart $ 361.75 1 User License for Windows XP/Vista/7/8 on x86/x64, price per license from 5-9 Licenses Delivered via Download
Min Qty: 5
Add to Cart $ 341.65 1 User License for Windows XP/Vista/7/8 on x86/x64, price per license from 10 Licenses Delivered via Download
Min Qty: 10

PDFlib TET 4.3 Windows Desktop Systems with Annual Support

Add to Cart $ 482.33 1 User License for Windows XP/Vista/7/8 on x86/x64 Delivered via Download
Add to Cart $ 434.10 1 User License for Windows XP/Vista/7/8 on x86/x64 , price per license from 5-9 Licenses Delivered via Download
Min Qty: 5
Add to Cart $ 409.98 1 User License for Windows XP/Vista/7/8 on x86/x64 , price per license from 10 Licenses Delivered via Download
Min Qty: 10

PDFlib TET Windows Desktop Systems Annual Support Renewal

Add to Cart $ 80.39 1 User License for Windows XP/Vista/7/8 on x86/x64 Delivered via Download
Add to Cart $ 72.35 1 User License for Windows XP/Vista/7/8 on x86/x64 , price per license from 5-9 Licenses Delivered via Download
Min Qty: 5
Add to Cart $ 68.33 1 User License for Windows XP/Vista/7/8 on x86/x64 , price per license from 10 Licenses Delivered via Download
Min Qty: 10

PDFlib TET 4.3 Mac OS X Desktop Systems

Add to Cart $ 401.94 1 User License for Apple Mac OS X PPC/Intel Delivered via Download
Add to Cart $ 361.75 1 User License for Apple Mac OS X PPC/Intel, price per license from 5-9 Licenses Delivered via Download
Min Qty: 5
Add to Cart $ 341.65 1 User License for Apple Mac OS X PPC/Intel, price per license from 10 Licenses Delivered via Download
Min Qty: 10

PDFlib TET 4.3 Mac OS X Desktop Systems with Annual Support

Add to Cart $ 482.33 1 User License for Apple Mac OS X PPC/Intel Delivered via Download
Add to Cart $ 434.10 1 User License for Apple Mac OS X PPC/Intel , price per license from 5-9 Licenses Delivered via Download
Min Qty: 5
Add to Cart $ 409.98 1 User License for Apple Mac OS X PPC/Intel , price per license from 10 Licenses Delivered via Download
Min Qty: 10

PDFlib TET Mac OS X Desktop Systems Annual Support Renewal

Add to Cart $ 80.39 1 User License for Apple Mac OS X PPC/Intel Delivered via Download
Add to Cart $ 72.35 1 User License for Apple Mac OS X PPC/Intel , price per license from 5-9 Licenses Delivered via Download
Min Qty: 5
Add to Cart $ 68.33 1 User License for Apple Mac OS X PPC/Intel , price per license from 10 Licenses Delivered via Download
Min Qty: 10

PDFlib TET 4.3 Windows Server Systems

Add to Cart $ 1,082.16 1 Server License for Windows Server x86/x64 Delivered via Download
Add to Cart $ 973.94 1 Server License for Windows Server x86/x64, price per license from 5-9 licenses Delivered via Download
Min Qty: 5
Add to Cart $ 919.83 1 Server License for Windows Server x86/x64, price per license from 10 licenses Delivered via Download
Min Qty: 10

PDFlib TET 4.3 Windows Server Systems with Annual Support

Add to Cart $ 1,298.59 1 Server License for Windows Server x86/x64 Delivered via Download
Add to Cart $ 1,168.73 1 Server License for Windows Server x86/x64 , price per license from 5-9 licenses Delivered via Download
Min Qty: 5
Add to Cart $ 1,103.80 1 Server License for Windows Server x86/x64 , price per license from 10 licenses Delivered via Download
Min Qty: 10

PDFlib TET Windows Server Systems Annual Support Renewal

Add to Cart $ 216.43 1 Server License for Windows Server x86/x64 Delivered via Download
Add to Cart $ 194.79 1 Server License for Windows Server x86/x64 , price per license from 5-9 licenses Delivered via Download
Min Qty: 5
Add to Cart $ 183.97 1 Server License for Windows Server x86/x64 , price per license from 10 licenses Delivered via Download
Min Qty: 10

PDFlib TET 4.3 Mac OS X Server Systems

Add to Cart $ 1,082.16 1 Server License for Apple Mac OS X Server PPC/Intel Delivered via Download
Add to Cart $ 973.94 1 Server License for Apple Mac OS X Server PPC/Intel, price per license from 5-9 licenses Delivered via Download
Min Qty: 5
Add to Cart $ 919.83 1 Server License for Apple Mac OS X Server PPC/Intel, price per license from 10 licenses Delivered via Download
Min Qty: 10

PDFlib TET 4.3 Mac OS X Server Systems with Annual Support

Add to Cart $ 1,298.59 1 Server License for Apple Mac OS X Server PPC/Intel Delivered via Download
Add to Cart $ 1,168.73 1 Server License for Apple Mac OS X Server PPC/Intel , price per license from 5-9 licenses Delivered via Download
Min Qty: 5
Add to Cart $ 1,103.80 1 Server License for Apple Mac OS X Server PPC/Intel , price per license from 10 licenses Delivered via Download
Min Qty: 10

PDFlib TET Mac OS X Server Systems Annual Support Renewal

Add to Cart $ 216.43 1 Server License for Apple Mac OS X Server PPC/Intel Delivered via Download
Add to Cart $ 194.79 1 Server License for Apple Mac OS X Server PPC/Intel , price per license from 5-9 licenses Delivered via Download
Min Qty: 5
Add to Cart $ 183.97 1 Server License for Apple Mac OS X Server PPC/Intel , price per license from 10 licenses Delivered via Download
Min Qty: 10

PDFlib TET 4.3 Linux Server Systems

Add to Cart $ 1,082.16 1 Server License for Linux x86/Intel 64 Delivered via Download
Add to Cart $ 973.94 1 Server License for Linux x86/Intel 64, price per license from 5-9 licenses Delivered via Download
Min Qty: 5
Add to Cart $ 919.83 1 Server License for Linux x86/Intel 64, price per license from 10 licenses Delivered via Download
Min Qty: 10

PDFlib TET 4.3 Linux Server Systems with Annual Support

Add to Cart $ 1,298.59 1 Server License for Linux x86/Intel 64 Delivered via Download
Add to Cart $ 1,168.73 1 Server License for Linux x86/Intel 64 , price per license from 5-9 licenses Delivered via Download
Min Qty: 5
Add to Cart $ 1,103.80 1 Server License for Linux x86/Intel 64 , price per license from 10 licenses Delivered via Download
Min Qty: 10

PDFlib TET Linux Server Systems Annual Support Renewal

Add to Cart $ 216.43 1 Server License for Linux x86/Intel 64 Delivered via Download
Add to Cart $ 194.79 1 Server License for Linux x86/Intel 64 , price per license from 5-9 licenses Delivered via Download
Min Qty: 5
Add to Cart $ 183.97 1 Server License for Linux x86/Intel 64 , price per license from 10 licenses Delivered via Download
Min Qty: 10

Our prices include ComponentSource technical support and, for most downloadable products, an online backup and a FREE upgrade to the new version if it is released within 30 days of your purchase.  All sales are made on our standard Terms and Conditions and subject to our Return Policy. Please contact us if you require any licensing option not listed, including volume licensing and previous versions.

Our regular prices are shown above. Please logon to see your discounted prices.

What's new in TET 4.3:

  • Support for resolution information in generated TIFF images
  • Workarounds for various kinds of malformed PDF
  • Enhanced robustness against non-conforming input
  • Subtle enhancements in Unicode mapping including the following:
    • Preserve Japanese CUS values for gaiji
    • Added Thai mapping for Microsoft CUS values
  • Improved word boundary detection for certain glyph spacing
  • Refreshed builds for iOS, Android, Windows Compact Embedded and Embedded Linux
  • Minor improvements in the language bindings
  • Other bug fixes and enhancements

What's new in TET 4.2:

  • Enhanced repair mode for damaged PDF and improved robustness against various kinds of malformed data
  • Improved word boundary detection for ideographic CJK text and implemented the page option »ideographic«
  • Implemented the new page option keyword »docstyle=cad«
  • Extract images in JBIG2 format
  • Improved image merging to cover more flavors of PDF images
  • Made image merging more robust against malformed PDF images
  • Improved the ordering of placed images in TETML
  • Optionally omit ICC profiles from extracted images
  • Optionally use LZW compression for extracted TIFF images as alternative to Flate (also known as »Adobe Flate«) compression.


What's new in TET 4.1:

  • Reduced memory requirements for very large documents
  • Performance improvements
  • Support for PDF documents encrypted with Acrobat X
  • General Unicode and codepage conversion function
  • Bug fixes and robustness improvements
  • Improved heuristics for processing malformed PDF input
  • Additional PDF details available in TETML output
  • PCOS interface 8 with new pseudo objects, e.g. for detecting transparency
  • Improved handling of encrypted file attachments


Connectors, language bindings and platforms:

  • TET connector for the Apache TIKA toolkit
  • New language bindings for Objective-C and Ruby
  • Object-oriented interface for Python
  • Updates for language bindings, connectors and platform support
  • Support for iOS, Android and (soon) Windows Embedded Compact/CE


Additional news in TET PDF IFilter 4.1:

  • New configuration options for controlling the indexing process
  • Improved automatic language detection
  • Gracefully handle non-PDF file attachments


New features in PDFlib TET 4.0:

  • Performance enhancements: faster for many classes of documents
  • Higher speed and smaller memory consumption for very large documents up to hundreds of thousands of pages
  • Extract right-to-left and bidirectional text for Arabic, Hebrew, etc.
  • Unicode postprocessing:
  • Foldings preserve, remove or replace characters
  • Decompositions replace a character with an equivalent sequence, e.g. replace narrow or vertical Japanese characters with their standard counterparts.
  • Text can be converted to all four Unicode normalization forms, e.g. emit NFC form to meet the requirements for Web text or a database.
  • Improved shadow removal, word boundary detection, and dehyphenation
  • Improved superand subscript detection
  • Workarounds for non-conforming PDF documents to enhance robustness
  • Enhanced repair mode for successfully extracting text from damaged PDF
  • More information in TET's XML output (TETML), e.g. dehyphenation, dropcap, shadow, and super/subscript
  • Improved C++ and Perl language bindings

In addition to low-level text retrieval TET contains advanced content analysis algorithms for determining word boundaries, removing redundant duplicate text (such as shadows and artificial bold). Using the auxiliary pCOS interface you can retrieve arbitrary objects from the PDF, such as metadata, hypertext, etc.

With PDFlib TET you can:

  • Extract text from PDF, e.g. to store it in a database
  • Implement a search engine for processing PDF
  • Convert the text content of PDF pages to XML for processing with other tools
  • Process PDFs based on their contents

Supported PDF Input
PDFlib TET supports all relevant flavors of PDF input:

  • PDF 1.0 up to PDF 1.7 extension level 8 and PDF 2.0, corresponding to Acrobat 1-XI
  • All font and encoding types: base 14 fonts, TrueType, PostScript, OpenType, CID fonts
  • Encrypted PDF with 40- and 128-bit encryption (appropriate permission settings or password required)

Unicode
Although text in PDF is usually not encoded in Unicode, PDFlib TET will normalize the text from a PDF document to Unicode:

  • TET converts all text contents to Unicode. In C the text will be returned in the UTF-8 or UTF-16 formats, and as native Unicode strings in all other language bindings
  • Ligatures and other multi-character glyphs will be decomposed into a sequence of their constituent Unicode characters
  • Vendor-specific Unicode assignments (Private Use Area, PUA) are identified, and mapped to characters in the common Unicode area if possible
  • Glyphs without appropriate Unicode mappings are identified as such, and are mapped to a configurable replacement character

Full CJK Support
TET includes full support for extracting Chinese, Japanese, and Korean text. All predefined CJK CMaps (encodings) are recognized; horizontal and vertical writing modes are supported.

Content Analysis and Word Identification
TET can be used to retrieve low-level glyph information, but also includes advanced algorithms for content analysis:

  • Detect word boundaries to retrieve words instead of characters
  • Recombine the parts of hyphenated words
  • Remove duplicate instances of text, e.g. shadow and artificial bold text
  • Recombine paragraphs into reading order
  • Reorder text which is scattered over the page
  • Reconstruct lines of text

Geometry
TET provides precise metrics for the text, such as the position on the page, glyph widths, text direction. Specific areas on the page can be excluded or included in the text extraction, e.g. to ignore headers and footers or margins.

pCOS Interface for simple Access to PDF Objects
TET includes the pCOS (PDFlib Comprehensive Object System) interface for retrieving arbitrary PDF objects. With pCOS you can retrieve PDF metadata, hypertext, or any other information outside the actual page descriptions with a simple query interface without the need for low-level programming.

Programming and Performance
TET has been developed with portability, performance, and robustness in mind. TET is thread-safe for deployment in multi-threaded server applications. The core library is written in highly optimized C code for maximum performance and minimum overhead. Additional language bindings are available for COM, C, C++, Java, and .NET.

TET Command-Line Tool and TET Library
TET is available as a programming library (component) for various development environments, and as a command-line tool for batch operations. Both offer the same base functionality, but are suitable for different deployment tasks. Here are some guidelines for choosing among both TET flavors:

  • The TET programming library can be used for integration into your desktop or server application. Examples for using the library with all supported language bindings are included in the TET package
  • The TET command-line tool is suited for batch processing PDF documents. It doesn’t require any programming, but offers command-line options which can be used to integrate it into complex workflows. The TET command-line tool can be used to convert PDF page content to an XML document with Unicode text, with or without character metrics

TET Plugin
PDFlib TET Plugin is a free plugin for extracting Text out of PDF documents. The TET Plugin provides easy access to the PDFlib Text Extraction Toolkit (TET). Although the TET Plugin runs as an Acrobat plugin, the underlying text extraction does not use Acrobat functions, but is completely based on TET. The TET Plugin is provided as a technology study to demonstrate the power of PDFlib TET.


Support

Annual Support is available to purchase with new product licenses or to renew an existing support contract:
The support contract is for 12 months and includes:

  • Technical support with short response times
  • All minor (maintenance) and major (functional) updates
  • Early availability of bug fixes

Product Search

Enter search words:

Quick Links

Publisher

Primary Category

Related Products

Related Categories

The Software Superstore for IT Professionals

As Official Distributors, ComponentSource offers convenient, one-stop shopping from 250+ Software Publishers specializing in Application Development & Operations.

More Info | About Us

Screenshot Gallery

Click for full screen preview

Screenshot of PDFlib TET - .NET/COM/Java/Application - 4.3 Screenshot of PDFlib TET - .NET/COM/Java/Application - 4.3 Screenshot of PDFlib TET - .NET/COM/Java/Application - 4.3 Screenshot of PDFlib TET - .NET/COM/Java/Application - 4.3 Screenshot of PDFlib TET - .NET/COM/Java/Application - 4.3
Award