dtSearch excludes noise words

Version 7.64 adds dtsListIndexSkipNoiseWords to list words in an index without including any noise words.
Maio 20, 2010 - 16:52
Lançamentos em destaque

The dtSearch product line can instantly search terabytes of text across a desktop, network, Internet or Intranet site. dtSearch products also serve as tools for publishing, with instant text searching, large document collections to Web sites or portable media. Developers can embed dtSearch’s instant searching and file format support into their own applications.

The following editions are available:

Updates in V7.64

Enhancements (dtSearch Engine)

  • Added dtsSearchLanguageAnalyzerSynonyms flag to enable using a language analyzer to generate morphological variations on a search term at search time. When this flag is set, the language analyzer is called for each word or phrase in the search request. The flag dtsLaInputIsSearchTerm is passed to the language analyzer in dtsLaJob.flags, so the language analyzer knows why it is being called.
  • Added dtssGetWordBreaker API function to provide direct access to the dtSearch Engine's internal word breaker using the language analyzer API. For sample code demonstrating how to use this API, see the WordBreak example in examples\vc8\WordBreak.
  • Added more structural information to the output generated by conversion to the it_ContentAsXml file format.
  • Added to COM interface: WordListBuilder.ListFieldValues, WordListBuilder.SetFilter, and IndexJob.EnumerableFields.
  • Added dtsListIndexSkipNoiseWords flag for ListIndexJob to list words in an index without including any noise words.
  • Added dtsoFfSkipDataSourceFields flag for Options.FieldFlags to prevent DocFields values from appearing in FileConverter output

Fixes and minor enhancements

  • Fixed incorrect display of CreationDate and ModDate properties in PDF files
  • Fixed incorrect hit highlighting when Unicode Filtering options at search time different from options used to index a file. To ensure consistent options, Unicode Filtering options are stored in the index when the index is created, in the index_a.ix file.
  • Fixed error updating index when directory specified for temporary files is inaccessible.
  • Fixed index merge bug causing "Inconsistent doc ids from target index" error during merge.
  • Fixed two search report bugs causing incorrect hit highlighting.
  • Improved formatting of documents converted from Ami Pro and Quattro Pro to HTML
  • Added automatic detection of gb2312 and JIS encoding.
  • Added automatic detection of XyWrite, XBase, WordStar 3.x, and WordPerfect 4.2 and TAR files.
  • Improved reporting of file types by FileConverter.DetectedTypeId, providing much more specific information about Microsoft Word versions and adding type detection for additional file formats
  • Added support for text extraction from Adobe Framemaker MIF, XFA form templates in PDF files, and Visio XML files
  • Fixed "Excessive nesting" error indexing OpenOffice document due to bug parsing table structure
  • Fixed RTF file parser bug affecting handling of the \upr tag
  • Other file parser bug fixes affecting Multimate, Lotus 1-2-3, PDF, Word, PowerPoint

About dtSearch Corp.

A leading supplier of text retrieval software, dtSearch Corp. develops, manufactures and sells the dtSearch text retrieval product line. dtSearch products have been the smart choice for Text Retrieval since 1991. The dtSearch product line is known for its "industrial-strength" (PC Magazine) ability to instantly search terabytes of text. dtSearch product line includes end-user, enterprise and developer text retrieval products. dtSearch product line also includes publishing capabilities, for publishing large document collections to Web sites or CD/DVD and Spidering capabilities, for remote site and distributed searching access. dtSearch products have received multiple awards and hundreds of excellent press reviews. Fortune 500 companies and others with some of the most demanding document search needs in the world rely on dtSearch. 4 out of 5 of Fortune Magazine’s most profitable companies have dtSearch developer or multi-user licenses. Typical corporate uses of dtSearch products include general information retrieval, Internet/Intranet site searching and access to technical documentation.

A document and search results in the dtSearch boolean search sample.

dtSearch Desktop with Spider

Instantly search your desktop, as well as selected Spidered Web sites.

Tem alguma pergunta?

Chat ao vivo com nossos especialistas de licenciamento de dtSearch Corp..