Released: Dec 20, 2022
Updates in 5.4
Features
- Security and performance updates of third-party components.
- Enhanced all language bindings and updated to the latest language versions including Microsoft .NET 6/7, PHP 8.1/8.2, Perl 5.34/5.36 and Ruby 3.1.
- Added support for ARM64/x86_64 bindings on Apple macOS.
- Improved TIKA and MediaWiki connectors.
Fixes
- Minor bug fixes and improvements.
Released: Nov 19, 2021
Updates in 5.3 (maintenance release)
Features
- Added support for Microsoft Windows 11.
Released: May 4, 2021
Updates in 5.3
Features
- Optimized PDF resource handling to improve performance for documents with excessive numbers of images, patterns or other resources.
- Security and performance updates of all third-party components.
- Harden processing of damaged and illegal PDF documents by testing the full
Issue Tracker
PDF corpus with tens of thousands of stressful PDF files
.
- Expanded platform and CPU support including macOS on ARM64 and Linux on ARM64.
- Timeout can be specified to limit processing time for large or...
Released: Jul 25, 2019
Updates in 5.2
Features
- Improved table detection with row and column span identification.
- Mark Artifacts (irrelevant text and images) in TETML and the API.
- Extract text and images from annotations and patterns.
- Support for inline images and images in soft masks.
- New language binding for .NET Core.
- Enhancements in all language bindings and updates for the latest language versions.
- Many bug fixes, improvements and workarounds for damaged PDF.
- Security updates for third-party libraries.
- Optionally retrieve...
Released: Jun 1, 2017
Updates in 5.1
Features
- Numbered and unnumbered lists are identified and expressed in TETML.
- Repair mode for damaged input documents with cross-reference streams.
- Improved workarounds for non-conforming input documents.
- Improved performance for disabled image, color, and vector engines as well as for documents without layers.
- Reduced memory requirements.
- pCOS interface updated to version 11.
- Updated language bindings.
Released: Nov 9, 2015
Updates in this release
Updates in 5
- Retrieve fill and stroke color of text.
- Improved page and table layout recognition.
- Support vertical font metrics for CJK text.
- Significantly enhanced merging of fragmented images.
- Extract image masks and soft masks.
- Merge and convert JPEG 2000-compressed images.
- Preserve named spot colors in extracted TIFF images.
- Honor layers and clipping paths.
- Check whether an area on the page is empty, e.g. before placing a stamp or barcode.
- TET's XML output called TETML includes text color and...
Released: Mar 10, 2015
Updates in this release
Updates in 4.4
- Language binding support for PHP 5.6, Perl 5.20, Python 3.4, Ruby 2.1 and 2.2.
- Image extraction
- Correct color output for certain CMYK JPEG images.
- Enhanced merging of fragmented images.
- Enhanced filtering of small images.
- Reduced memory requirements.
- Enhanced dropcap detection.
- Workaround for malformed fonts and PDFs.
- Improved results for text in right-to-left languages.
Released: Jun 10, 2014
Updates in this release
Updates in 4.3
- Support for TIFF image resolution information.
- Workarounds for various kinds of malformed PDFs.
- Enhanced robustness against non-conforming input.
- Improved word boundary detection for certain glyph spacings.
Released: Mar 12, 2012
Updates in this release
Updates in V4.1
- Reduced memory requirements for very large documents
- Performance improvements
- Support for PDF documents encrypted with Acrobat X
- Bug fixes and robustness improvements
- Improved heuristics for processing malformed PDF input
- Additional PDF details available in TETML output
- PCOS interface 8 with new pseudo objects, e.g. for detecting transparency
- Improved handling of encrypted file attachments
Released: Aug 19, 2010
Updates in this release
Updates in V4.0
New features in PDFlib TET 4.0:
- Performance enhancements: faster for many classes of documents
- Higher speed and smaller memory consumption for very large documents up to hundreds of thousands of pages
- Extract right-to-left and bidirectional text for Arabic, Hebrew, etc.
- Unicode post-processing:
- Foldings preserve, remove or replace characters
- Decompositions replace a character with an equivalent sequence, e.g. replace narrow or vertical Japanese characters with their standard...