NformatiX_medium.jpg

 

Software For Full Text Information Retrieval:

Usage


Home | Installation | Usage | FAQs | Utilities | Architecture | QA | Mailing List | License | Author | Download | Thanks



home.jpg
installation.jpg
usage.jpg
FAQs.jpg
utilities.jpg
architecture.jpg
QA.jpg
mailinglist.jpg
license.jpg
author.jpg
download.jpg
thanks.jpg

Rel is a program that determines the relevance of text documents to a set of keywords expressed in boolean infix notation. The list of file names that are relevant are printed to the standard output, in order of relevance. The boolean operators supported are logical or, logical and, and logical not. These operators are represented by the symbols, "|", "&", and, "!", respectively, and left and right parenthesis, "(" and ")", are used as the grouping operators. The paths can be files and/or directories-if it is a directory, the program will recursively descend into the directory, searching all files and directories contained in the directory.

For example, the command:


  rel "(directory & listing)" /usr/share/man/cat1

        

(ie., find the order of relevance of all files that contain both of the words "directory" and "listing" in the catman directory) will list a few tens of files, out of the hundreds of catman files, of which "ls.1" is the among the most relevant-meaning that to find the command that lists directories in a Unix system, the "literature search" was reduced, on average, by about 98%, which is a considerable expediency in relation to browsing through the files in the directory. Although this example is remedial, a similar expediency can be demonstrated in searching for documents in email repositories and text archives.

The rel home page is at http://www.johncon.com/nformatix/.

Additional applications include information robots, (ie., "mailbots," or "infobots,") where the disposition (ie., delivery, filing, or viewing,) of text documents can be determined dynamically, based on the relevance of the document to a set of criteria, framed in boolean infix notation. Or, in other words, the program can be used to order, or rank, text documents based on a "context," specified in a general mathematical language, similar to that used in calculators.

The words in the query are case insensitive, and either upper or lower case can be used.

Associativity of operators is left to right, and the precedence of operators is identical to 'C':

precedence operator

high

! = not

middle

& = and

lowest

| = or

The operator symbols can be escaped with the "\" character to include the symbol in a search pattern. The "escape space" character sequence represents one or more instances of space character(s) in search patterns, and each instance will match one or more consecutive whitespace characters, (as defined by isspace(3) in ctype.h and/or locale.h,) and allows phrases to be searched for. The "many to one" whitespace character translation occurs in both the keyword arguments and the text document(s). Multiple consecutive instances of the "escape space" character sequence in keyword search phrases should not be used, and single instances are appropriate only when necessary to specify a consecutive sequence of keywords-the logical and operator is the preferred searching construct when searching documents that contain set(s) of keywords.

Hyphenation issues are addressed by deleting hyphens and any following sequence of instances of whitespace characters, (as defined by isspace(3),) in both the keyword arguments and the text document(s).

Backspace character issues are addressed by overwriting the character before the backspace with the character after the backspace, which will instantiate the character of the last instance of of consecutive backspace/character combinations. This is specifically for catman pages which utilize underscore/backspace/character combinations for underlining, in addition to backspace/character combinations for bold (overstrike,) representation-note that for this process to be successful, a single underscore (used for underlining,) must preceed a single character in the sequence.


THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY.

So there.

Copyright © 1994-2011, John Conover, All Rights Reserved.


Comments and/or bug reports should be addressed to:

john@email.johncon.com

http://www.johncon.com/
http://www.johncon.com/ntropix/
http://www.johncon.com/ndustrix/
http://www.johncon.com/nformatix/
http://www.johncon.com/ndex/



Home | Installation | Usage | FAQs | Utilities | Architecture | QA | Mailing List | License | Author | Download | Thanks


Copyright © 1994-2011 John Conover, john@email.johncon.com. All Rights Reserved.
Last modified: Tue Mar 1 18:17:17 PST 2011 $Id: usage.html,v 1.0 2011/03/02 02:19:56 conover Exp $
Valid HTML 4.0!