PDFlib TET PDF IFilter
Estrae il testo e i metadati dai documenti PDF.
Pubblicato da PDFlib
Distribuito da ComponentSource dal 2003
Prezzi da: $ 155.52 Versione: 5.4 Aggiornato il: Dec 20, 2022
Estrae il testo e i metadati dai documenti PDF.
PDFlib TET PDF IFilter extracts text and metadata from PDF documents and makes it available to search and retrieval software on Windows. This allows PDF documents to be searched on the local desktop, a corporate server, or the Web.
TET PDF IFilter extracts text and metadata from PDF documents and makes it available to search and retrieval software on Windows. This allows PDF documents to be searched on the local desktop, a corporate server, or the Web. TET PDF IFilter is based on the patented PDFlib Text Extraction Toolkit (TET), which is a developer product for reliably extracting text from PDF documents.
TET PDF IFilter is a robust implementation of Microsoft’s IFilter indexing interface. It works with all search and retrieval products which support the IFilter interface, e.g. SharePoint and SQL Server. Such products use format-specific filter programs – called IFilters – for particular file formats, e.g. HTML. TET PDF IFilter is such a program, aimed at PDF documents. The user interface for searching the documents may be the Windows Explorer, a Web or database frontend, a query script, or a custom application. As an alternative to interactive searches, queries can also be submitted programmatically without any user interface.
Based on patented TET technology
PDFlib TET, the basis of TET PDF IFilter, was first released in 2002, and has been used by customers worldwide in server and desktop environments. As an alternative to extracting PDF page contents and metadata as raw text, TET can supply the document contents in XML format. TET is also available as a free plugin for Adobe Acrobat; this plugin allows interactive test and evaluation of TET’s superior text extraction.
Unique advantages
TET PDF IFilter offers the following advantages:
EnterprisePDF search
TET PDF IFilter is available in fully thread-safe native 32- and 64-bit versions. You can implement enterprise PDF search solutions with TET PDF IFilter and the following products:
Desktop PDF search
TET PDF IFilter can also be used to implement desktop PDF search, e.g. with the following products:
Accepted PDF input
TET PDF IFilter supports all relevant flavors of PDF input:
XMP Document Metadata and Document Info Entries
The advanced metadata implementation in TET PDF IFilter supports the Windows property system for metadata. It indexes XMP metadata (Adobe’s rich XML-based metadata description language) as well as standard or custom document info entries. Metadata indexing can be configured on several levels:
XMP Image Metadata
In addition to document metadata, TET PDF IFilter also supports XMP metadata attached to individual images. In modern workflows metadata travels with the image, e.g. from the digital camera to Photoshop editing up to page layout creation and PDF production. TET PDF IFilter picks up XMP image metadata and makes it available for searches. For example, you can search for documents which contain images from a certain category, images created by a specific photographer, etc.
Internationalization
TET PDF IFilter includes full support for extracting Chinese, Japanese, and Korean (CJK) text. All CJK encodings are recognized; horizontal and vertical writing modes are supported. Automatic detection of the locale ID (language and region identifier) of the text improves the results of Microsoft’s word breaking and stemming algorithms, which is especially important for East Asian text.
PDF is more than just a Bunch of Pages
TET PDF IFilter treats PDF documents as containers which may contain much more information than only plain pages. TET PDF IFilter indexes all relevant items in PDF documents: