What's New in dtSearch 7.64?
Enhancements (dtSearch Engine)
- Added dtsSearchLanguageAnalyzerSynonyms flag to enable using a language analyzer to generate morphological variations on a search term at search time. When this flag is set, the language analyzer is called for each word or phrase in the search request. The flag dtsLaInputIsSearchTerm is passed to the language analyzer in dtsLaJob.flags, so the language analyzer knows why it is being called.
- Added dtssGetWordBreaker API function to provide direct access to the dtSearch Engine's internal word breaker using the language analyzer API. For sample code demonstrating how to use this API, see the WordBreak example in examples\vc8\WordBreak.
- Added more structural information to the output generated by conversion to the it_ContentAsXml file format.
- Added to COM interface: WordListBuilder.ListFieldValues, WordListBuilder.SetFilter, and IndexJob.EnumerableFields.
- Added dtsListIndexSkipNoiseWords flag for ListIndexJob to list words in an index without including any noise words.
- Added dtsoFfSkipDataSourceFields flag for Options.FieldFlags to prevent DocFields values from appearing in FileConverter output
Fixes and minor enhancements
- Fixed incorrect display of CreationDate and ModDate properties in PDF files
- Fixed incorrect hit highlighting when Unicode Filtering options at search time different from options used to index a file. To ensure consistent options, Unicode Filtering options are stored in the index when the index is created, in the index_a.ix file.
- Fixed error updating index when directory specified for temporary files is inaccessible.
- Fixed index merge bug causing "Inconsistent doc ids from target index" error during merge.
- Fixed two search report bugs causing incorrect hit highlighting.
- Improved formatting of documents converted from Ami Pro and Quattro Pro to HTML
- Added automatic detection of gb2312 and JIS encoding.
- Added automatic detection of XyWrite, XBase, WordStar 3.x, and WordPerfect 4.2 and TAR files.
- Improved reporting of file types by FileConverter.DetectedTypeId, providing much more specific information about Microsoft Word versions and adding type detection for additional file formats
- Added support for text extraction from Adobe Framemaker MIF, XFA form templates in PDF files, and Visio XML files
- Fixed "Excessive nesting" error indexing OpenOffice document due to bug parsing table structure
- Fixed RTF file parser bug affecting handling of the \upr tag
- Other file parser bug fixes affecting Multimate, Lotus 1-2-3, PDF, Word, PowerPoint
What's New in dtSearch 7.63?
- Added IndexFileInfo.UserFields in .NET API to provide access to stored fields through the IIndexStatusHandler callback interface during indexing
- Added dtsnIndexDeletedFileRemoved, dtsnIndexListedFileRemoved, and dtsnIndexListedFileNotRemoved notifications to the indexing status callbacks to notify the calling application when files are removed from the index during indexing or when an attempt to remove a listed file fails
What's New in dtSearch 7.62?
- Regular Expression searching extended to support TR1 regular expressions
- Added new cmap files for PDF extraction
- Reduced Memory use for searches that retreive large numbers of documents with a relatively small MaxFilesToRetrieve value
What's New in dtSearch 7.61?
Added new user interface appearance options and updated toolbar icons
What's New in dtSearch 7.5?
New dtSearch Desktop with Spider 64-bit version: The new release includes a native 64-bit version of the dtSearch Engine for Win & .NET (for .NET 2.0/3.0) for developers to integrate into web-based and other applications. The 64-bit version provides full API access to dtSearch's terabyte indexer and search functionality, file format and database support (including SQL BLOB data).
International language enhancements: dtSearch products include international language support through Unicode, covering hundreds of international languages. The new version adds improved searching of Chinese, Japanese and Korean text presented without spaces between words. The new version also offers improved developer API integration with third-party international language morphological analyzers like those from Basis Technology
What's New in dtSearch 7.43?
- Fixed bug in PDF file parser affecting decoding of CID fonts in PDF files
- Fixed error extracting item from TAR file to hit-highlight after search
- Added detection of the following file types with missing or incorrect filename extensions: Microsoft Word 2003 XML files, Microsoft Excel 2003 XML files.
- Fixed error indexing using data source API under WebSphere
- Fixed extra spacing in output when HTML converted to UTF-8 text
What's New in dtSearch 7.40?
- Automatic recognition of dates, email addresses, and credit card numbers in text
- Support for Vista XMP metadata
- Support for PowerPoint 2007 (*.pptx). (The product line already supports Word 2007 (*.docx) and Excel 2007 (*.xlsx))
- Support for Vista XML Paper Specification (*.xps) documents
- A new IndexCache object in the .NET 2.0 API, and dtsIndexCache object in the C++ API of the dtSearch Engine. The new objects enable much faster searching when a series of searches must be done against a small number of indexes
What's New in dtSearch 7.30?
Enhancements (All products)
- Added preliminary support for Word 2007 (*.docx) and Excel 2007 (*.xlsx) based on the current Office 2007 beta and available documentation.
- Added support for JPG and TIFF metadata, including EXIF and IPTC fields.
- Unicode filtering file parser can handle individual documents larger than 2 Gb, and support for files larger than 2 Gb added to the extext.exe utility
- Improved handling of partially inaccessible email files. In previous versions, if an email had encrypted or corrupt data (for example, an encrypted attachment), the whole email was reported as encrypted or corrupt. In this version, the readable portion of the message is indexed and the unreadable portion is separately reported as a partially encrypted or partially unreadable file. This change applies to Outlook messages, TNEF files, .eml files, MBOX archives, and .msg files.
Enhancements (dtSearch Engine)
- Beta x64 (64-bit) versions of the dtSearch Indexer and dtSearch Engine (dtIndexer64.exe, dtengine64.dll, and dtSearchNetApi2.dll. The index format and APIs (C++, COM, and .NET) are identical to the 32-bit version. The 64-bit components are in a separate download file (dtSearch64_730.exe) with the same installation password as the dtSearch Engine SDK.
- Added alternative PDF highlighting mechanism for client-based applications (see "Highlighting Hits in PDF files" in the API Overviews section for details)
- Added ListIndexJob object to the .NET 2.0 API to list files, words, or fields in an index (see dtSearchNetApi2.chm for API reference)
- Added dtsListIndexIncludeDocId flag for dtsListIndexJob and ListIndexJob to provide a quick way to list all documents in an index and the doc id for each document
- C++ API Changes to support 64-bit file sizes in dtsInputStream (added size64 and seek64), dtsInputStreamReader, dtsFileInfo (added size64), dtsSearchResultsItem (added size64). These changes preserve binary compatibility for the dtSearch Engine DLL, but some C++ code may trigger new warnings when compiled because of 64-bit values returned.
- Added dtsIndexKeepExistingDocIds flag to specify that, when compressing an index, the indexer should not remap document ids, so document ids will be unmodified in the index once compression is done.
- Fixes and minor enhancements
What's New in dtSearch 7.20?
- New file parsers for OpenOffice documents, spreadsheets, and presentations (*.sxw, *.sxc, *.odt, *.ods, etc.), covering OpenOffice version 1 and OpenOffice version 2 (the "Open Document Format for Office Applications")
- New file parsers for the Microsoft Office XML formats (Microsoft Word 2003 XML and Microsoft Excel 2003 XML)
- Added "Opening containing folder" in right-click menu for retrieved items
- Improved reporting of errors that occur when copying files in Edit > Copy File(s)
- dtindexer.exe: added /caf and /cat command-line option to cache text (/cat) or cache original files (/cad), when creating indexes using the command line, and /recog to recognize an index.
- Added Help > Check For Updates feature to automatically download new versions
The new release includes major enhancements to the dtSearch product line's display of MS Word, Excel and PowerPoint documents. The new release also includes enhancements for indexing and searching Outlook message stores. Finally, the new release includes an additional feature for forensics usage.
dtSearch Desktop with Spider Main Features:
- instantly search terabytes of text from your desktop
- 25+ fielded & full-text search options (supports hundreds of international languages)
- File parsers/converters highlight hits in popular file types
- Spiders static & dynamic web data; hit-highlighted WYSIWYG displays
dtSearch Desktop with Spider offers instant indexed (and slower unindexed) searching of large document collections. Proprietary indexing and searching algorithms maintain a fast rate of indexing and virtually instantaneous searching over very large document collections. dtSearch Desktop with Spider includes over two dozen text search options can work alone or in combination for unmatched intelligent searching. dtSearch supports Microsoft Access, Excel (*.xls, *.xlsb, *.xlsx), Word (*.doc, *.docx, *.rtf), and PowerPoint (*.ppt, *.pptx) files created by Office 2010
dtSearch Desktop with Spider search features include: fuzziness adjustable from 0 to 10, synonym/concept/thesaurus, boolean, phrase, wildcard, proximity, stemming, numeric range, natural language relevancy-ranked by hit density and rarity, variable term weighting, indexed and unindexed searching, and more.
dtSearch Desktop with Spider automatically recognize a wide variety of document types, including word processor, database, spreadsheet, ZIP, XML and more. The products highlight hits in HTML and PDF while keeping all embedded links and images intact. The products have built-in file converters to convert other popular file types to HTML for display with highlighted hits.
dtSearch Desktop with Spider Main Features
Fast, precision searching
- Over two dozen text search options
- Most indexed searches take less than a second, even through very large databases
- Also has unindexed searching
- Automatically recognizes word processor, database, spreadsheet, email, PDF, ZIP, HTML, XML, Unicode files & more
- FindPlusdistributed searching extends the reach of a single search request to remote enterprise servers
- Point and click setup
- Highlights hits in HTML and PDF while keeping embedded links and images intact
- Converts other file types to HTML for display with highlighted hits
dtSearch Desktop with Spider new release adds FindPlusdistributed searching, a Web spider, enhanced XML support and Unicode support, to improve access to information throughout an organization. The new release also offers API enhancements, expanding the dtSearch developer components utility for use with a wide variety of programming languages. FindPlusdistributed searching is an integrated feature of dtSearch Desktop, Web and Network that conveniently allows a single search request to span everything from local drives to remote servers. Operating through a single user interface, FindPlus enables indexed searching of files and other data throughout an organization, without the need to collect the data in a monolithic repository. Because FindPlus uses an XML-based protocol for exchanging and aggregating search information, developers using the dtSearch Engine can also easily incorporate this capability into their own applications.
In addition, enhanced XML support provides a way to combine data from any source, while retaining the ability to search on field and table information. XML is increasingly becoming a universal data format. However, other search engines do not fully incorporate the hierarchical structures in XML data, effectively reducing XML to "flat" text. In contrast, dtSearch can perform indexed searches using the full range of dtSearch features across an entire XML database, or limited to a specific combination of fields or sub-fields, with no sacrifice in speed.
dtSearch Desktop with Spider new features include:
- A web indexing spider, providing a way to use other web sites as instantly searchable resources.
- Unicode support, enabling indexing and searching of text data in nearly any language
dtSearch Desktop with Spider developer features include:
- Java support through a JNI interface
- More sample code in C++, Visual C++, Visual Basic and Delphi
- Improved multithreaded operation for use with ASP and .NET
- More sample source code to dtSearch Web, for both ASP and ISAPI-based versions
- Improved indexing and searching of ActiveX and other data sources (such as SQL databases), with hit highlighted search results display
- Search results serialization as an XML or URL-encoded stream