HSEARCHTEXT(1) HSEARCHTEXT(1) NAME hsearchtext - hash search an unordered text file of character strings SYNOPSIS hsearchtext [-A] [-h number] [-I] [-N] [-P] [-r m|n] [-S] [-v] filename [string(s)] DESCRIPTION Hsearchtext is for hash searching an unordered text file of character strings. The program requires mmap(2) to map the database file into the Unix VM system. The database file name is a required command line argument. The database is a standard Unix text file, one string per line. The database mechanism is conservative with machine resources, requir- ing about 17.5 micro-seconds of machine time to lookup a word in the Unix system dictionary, (2.5 MB, quarter of a million words, single 466 MHz., Pentium, lightly loaded, Linux 2.2, time(1) command to lookup every word in the dictionary, divided by the number of words.) The program, optionally, eliminates duplicate records, (i.e., records that are lexically equal,) removes null records, (i.e., "^$",) converts all characters to lowercase, and parses records with whitespace, leav- ing only the last token as the record. The program can be used to hash queries. The strings to be searched for may be supplied as additional optional command line arguments, or redi- rected to the program via stdin for compatibility with procmail(1), and other e-mail scripting agents. A suitable procmail(1) recipe example might be: :0 wfh * ? something | hsearchtext reject.db | formail -A "X-Notice: Word in reject.db database" which could be, if necessary, overridden, on a case-by-case basis, with the example recipe: :0 wfh * ^X-Notice: +Word +in +reject.db +database * ? something | hsearchtext accept.db | formail -I "X-Notice: Word in reject.db database" or similar construct, where the databases contain e-mail addresses or domain names, etc. Since the database file is read-only memory mapped, using mmap(2), and the database file closed immediately after the mmap call, the unstruc- tured/unordered database file can be appended from the output of the hsearchtext(1) program, i.e., for example, constructs like: hsearchtext -P example.db "this" "and" "that" > example.db are permitted, (which, for example, would add the words "this", "and", "that" to the unstructured/unordered database file, example.db, but only if the words were not already in the file.) Additionally, it is not required that the database file exist, and/or be consistent with the requirements of mmap(2). Specifically, the file does not have to exist, and/or can have a size of zero. The program contains less than 700 lines of declarations and state- ments, all of which are documented with in line comments. The program has been compiled and tested on SunOS, Solaris, and Linux, and may work on other brands of Unix. If used for querying an unordered text file of character strings, the program returns 0 if no error and any of the specified strings were found in the database file, 1 if no error and no strings were found; else returns a unique error code greater than 1 representing the error encountered-which will, also, print an error diagnostic to stderr. The -r option is useful for controlling the return value under error conditions-for example, the program return can be preempted if the database file can not be opened, (or read,) with a return value of match, or no match, depending on environmental requirements. OPTIONS filename File name. string(s) Character string(s) to be searched for, (defaults to stdin). -A Return = match if all strings found, (match if any string found). -h number Hash table size = prime number (99871). -I Case sensitive alphabet. -N Include null records. -P Print the string(s) not in the database. -r m|n On file error, exit return = match for m, no match for n. -S Disable whitespace in file warning. -v Print the program's version information. WARNINGS Under buffer overflow conditions, the program makes no attempts at han- dling the situation-it just detects it, prints an error message, and exits. The program is capable of rejecting entire Class A, Class B, or Class C, IP address ranges. Discretion is advised. SEE ALSO receivedIP(1), receivedIPdb(1), receivedIPdbdedup(1), receivedIPdbrm(1), receivedIPdbusort(1), bsearchtext(1), receivedAd- dress(1), receivedTodb(1), receivedMSGIDdb(1), receivedUnknowndb(1), tolower(1), toupper(1), bsorttext(1) receivedIPforgedb(1), hsearch- text(1), bsearchbody(1) DIAGNOSTICS Error messages for incompatible arguments, failure to allocate memory, inaccessible files, opening and closing files. AUTHORS ---------------------------------------------------------------------- A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. Copyright (c) 2001-2007, John Conover, All Rights Reserved. Comments and/or bug reports should be addressed to: john@email.johncon.com (John Conover) ---------------------------------------------------------------------- January 16, 2007 HSEARCHTEXT(1)