经 Qoppa Software - 产品类型: 构件 / Java类
您可从以下链接找到当前的信息
在 PDF
若您需要为此具体版本购买一份License, 请联系我们 以了解产品是否有售,以及价格。
本页所显示的其他信息及仅仅用作历史数据参考,很可能有了较大的变动。
Extract text from PDF documents. jPDFText is a Java library that lets you process PDF documents to extract the textual content for archiving, storage, searching or indexing. With jPDFText you can load PDF documents from files, network drives, URLs or input streams. You can extract text or extract words as a vector of strings. jPDFText is built on top of Qoppa's proprietary PDF technology so there is no need for any third party software or drivers.
jPDFText is a Java library that integrates seamlessly into your application or applet to extract words from PDF documents. jPDFText provides the following functions:
jPDFText - Getting Started
The starting point for using jPDFText is the com.qoppa.pdfText.PDFText. This class is used to load a pdf document and extract the text from the document. The class provides three constructors to load PDF files from the file system, a URL or an InputStream. All constructors take an additional parameter, an object that implements IPasswordHandler, that will be queried if the PDF file has requires a password to open. For PDF files that are not encrypted, this second parameter can be null:
jPDFText - Extract Text
Once a PDFText object has been created, the host application simply needs to call the getText
jPDFText - Extracting Text Page by Page
To extract the text page by page, use the getText method that takes a page number as a parameter. You can get the number of pages from the PDFText object through the getPageCount method.
jPDFText - Extracting Words as a Vector of Strings
Once a PDFText object has been created, the host application simply needs to call the getWords method to get the list of words from the loaded PDF document.
jPDFText - Extracting Words Page by Page
To extract words page by page, use the getWords method that takes a page number as a parameter. You can get the number of pages from the PDFText object through the getPageCount method.
jPDFText - Getting Basic Information about the PDF Document (Title, Author, etc.)
To get basic information about the loaded PDF document, you need to get the DocumentInfo class accessible through PDFText.getDocumentInfo. From this class, you can get information about the document such as title, author, subject, keywords, etc.
jPDFText - Distribution and JAR Files
jPDFText is packaged in a single jar file, jPDFText.jar that gets installed with the evaluation sample. When distributing an application that contains jPDFText, the jPDFText.jar file needs to be distributed along with it and needs to be included in the class path when running the application.
强力驱动