PDFlib TET PDF IFilter - DLL - V3.0 - Resumen

de PDFlib - Tipo de producto: Componente / .NET Class / ActiveX DLL / DLL

Resumen

PDFlib TET PDF IFilter by PDFlib

Extract text and metadata from PDF documents. PDFlib TET PDF IFilter extracts text and metadata from PDF documents and makes it available to search and retrieval software on Windows. This allows PDF documents to be searched on the local desktop, a corporate server, or the Web.

TET PDF IFilter extracts text and metadata from PDF documents and makes it available to search and retrieval software on Windows. This allows PDF documents to be searched on the local desktop, a corporate server, or the Web. TET PDF IFilter is based on the patented PDFlib Text Extraction Toolkit (TET), which is a developer product for reliably extracting text from PDF documents.

TET PDF IFilter is a robust implementation of Microsoft’s IFilter indexing interface. It works with all search and retrieval products which support the IFilter interface, e.g. SharePoint and SQL Server. Such products use format-specific filter programs – called IFilters – for particular file formats, e.g. HTML. TET PDF IFilter is such a program, aimed at PDF documents. The user interface for searching the documents may be the Windows Explorer, a Web or database frontend, a query script, or a custom application. As an alternative to interactive searches, queries can also be submitted programmatically without any user interface.

Based on patented TET technology

PDFlib TET, the basis of TET PDF IFilter, was first released in 2002, and has been used by customers worldwide in server and desktop environments. As an alternative to extracting PDF page contents and metadata as raw text, TET can supply the document contents in XML format. TET is also available as a free plugin for Adobe Acrobat; this plugin allows interactive test and evaluation of TET’s superior text extraction.

Unique advantages

TET PDF IFilter offers the following advantages:

Indexes not only page content, but also metadata, bookmarks, PDF attachments, and PDF packages/portfolios

Extracts text even from PDFs where Acrobat fails

Indexes XMP image metadata

Performance: thread-safe, fast and robust, 32- and 64-bit

Lean stand-alone product without side effects

Automatic language/script detection

Actively supported by a dedicated team

EnterprisePDF search

TET PDF IFilter is available in fully thread-safe native 32- and 64-bit versions. You can implement enterprise PDF search solutions with TET PDF IFilter and the following products:

Microsoft Office SharePoint Server (MOSS)

Microsoft Search Server 2008 and the free Search Server 2008 Express

Microsoft SQL Server

Microsoft Exchange Server

TET PDF IFilter can be used with all other Microsoft and third-party products which support the IFilter interface.

Desktop PDF search

TET PDF IFilter can also be used to implement desktop PDF search, e.g. with the following products:

Windows Desktop Search (WDS): integrated in Windows Vista; also available as free add-on for Windows XP

Windows Indexing Service

Accepted PDF input

TET PDF IFilter supports all relevant flavors of PDF input:

All PDF versions up to 1.8 (Acrobat 9)

Encrypted PDFs which do not require a password for opening the document

Damaged PDF documents will be repaired if possible

XMP Document Metadata and Document Info Entries

The advanced metadata implementation in TET PDF IFilter supports the Windows property system for metadata. It indexes XMP metadata (Adobe’s rich XML-based metadata description language) as well as standard or custom document info entries. Metadata indexing can be configured on several levels:

Document info entries, Dublin Core fields and other common XMP properties are mapped to equivalent Windows properties, e.g. Title, Subject, Author

TET PDF IFilter adds useful PDF-specific pseudo properties, e.g. page size, PDF/A conformance level, font names

All relevant predefined XMP properties can be searched, e.g. dc:rights, xmpRights:UsageTerms, xmp:CreatorTool

User-defined XMP properties can be searched, e.g. company-specific classification properties, PDF/A extension schemas

XMP metadata attached to individual images on PDF pages can be indexed, and image-related XMP properties can be used for searching

TET PDF IFilter optionally integrates metadata in the full text index. As a result, even full text search engines without metadata support (e.g. SQL Server) can search for metadata

XMP Image Metadata

In addition to document metadata, TET PDF IFilter also supports XMP metadata attached to individual images. In modern workflows metadata travels with the image, e.g. from the digital camera to Photoshop editing up to page layout creation and PDF production. TET PDF IFilter picks up XMP image meta­data and makes it available for searches. For example, you can search for documents which contain images from a certain category, images created by a specific photographer, etc.

Internationalization

TET PDF IFilter includes full support for extracting Chinese, Japanese, and Korean (CJK) text. All CJK encodings are recognized; horizontal and vertical writing modes are supported. Automatic detection of the locale ID (language and region identifier) of the text improves the results of Microsoft’s word breaking and stemming algorithms, which is especially important for East Asian text.

PDF is more than just a Bunch of Pages

TET PDF IFilter treats PDF documents as containers which may contain much more information than only plain pages. TET PDF IFilter indexes all relevant items in PDF documents:

Page contents

Text in bookmarks

Embedded PDFs are processed recursively so that also the text in attached PDF documents can be searched.

All documents in a PDF package are indexed. PDF packages are an Acrobat 8 feature for grouping multiple documents in a single PDF file (in Acrobat 9 called portfolios)

PartNumbers: PC-518424-161107 518424-161107 PC-518424-161110 518424-161110

PurchaseOptions: PDFlib TET PDF IFilter V3.0 Windows Desktop Systems 1 User License for Windows 2000/XP/Vista , PDFlib TET PDF IFilter V3.0 Windows Server Systems 1 Server License for Windows 2000/2003/2008

Resources: Read the PDFlib TET PDF IFilter DataSheet, Read the PDFlib TET PDF IFilter DataSheet (German), Read the PDFlib TET PDF IFilter Manual, Read the PDFlib TET PDF IFilter XMP Metadata Support in PDFlib products White Paper, Read the PDFlib TET PDF IFilter XMP Metadata Support in PDFlib products White Paper (German), Read the PDFlib TET PDF IFilter End User License Agreement

Operating System for Deployment: Windows Server 2008, Windows Vista, Windows XP, Windows Server 2003, Windows 2000

Architecture of Product: 32Bit, 64Bit

Product Type: Component

Component Type: .NET Class, ActiveX DLL, DLL

Compatible Containers: Microsoft Visual Studio 2005, Microsoft Visual Studio .NET 2003, Microsoft Visual Studio 6.0, Microsoft Visual Basic 2005, Microsoft Visual Basic .NET 2003, Microsoft Visual Basic 6.0, Microsoft Visual C++ 2005, Microsoft Visual C++ .NET 2003, Microsoft Visual C++ 6.0, Microsoft Visual C# 2005, Microsoft Visual C# .NET 2003, .NET Framework 2.0, .NET Framework 1.1

Keywords: PDFlib GmbH pdf Search searching searches Professional Partner

Búsqueda de productos

Escriba palabras de búsqueda:

Enlaces

Autor

Categoría principal

Productos relacionados

Categorias relacionadas