BSORTTEXT(1) BSORTTEXT(1) NAME bsorttext - binary sort an unordered text file of character strings SYNOPSIS bsorttext [-I] [-N] [-o] [-P] [-r m|n] [-S] [-U] [-v] filename [string(s)] DESCRIPTION Bsorttext is for binary sorting an unordered text file of character strings. The program requires mmap(2) to map the database file into the Unix VM system. The database file name is a required command line argument. The database is a standard Unix text file, one string per line. The database mechanism is conservative with machine resources, requir- ing about 12.5 micro-seconds of machine time to lookup a word in the Unix system dictionary, (2.5 MB, quarter of a million words, single 466 MHz., Pentium, lightly loaded, Linux 2.2, time(1) command to lookup every word in the dictionary, divided by the number of words.) The most common usage is where the input file, example.db, is an unordered list of character strings, and the output file is a uniquely sorted list of unique character strings, with no null strings, out.db. bsorttext -o example.db > out.db. The sort mechanism is implemented as a quicksort. The program, optionally, eliminates duplicate records, (i.e., records that are lexically equal,) removes null records, (i.e., "^$",) converts all characters to lowercase, and issues a warning if whitespace is found in the file. Alternatively, the program can be used to binary search queries. The strings to be searched for may be supplied as additional optional com- mand line arguments, or redirected to the program via stdin for compat- ibility with procmail(1), and other e-mail scripting agents. A suitable procmail(1) recipe example might be: :0 wfh * ? something | bsorttext reject.db | formail -A "X-Notice: Word in reject.db database" which could be, if necessary, overridden, on a case-by-case basis, with the example recipe: :0 wfh * ^X-Notice: +Word +in +reject.db +database * ? something | bsorttext accept.db | formail -I "X-Notice: Word in reject.db database" or similar construct, where the databases contain e-mail addresses or domain names, etc. Similarly, the look(1) program could be used: :0 * ? look -f "Word" "${HOME}/reject.db" | formail -I "X-Notice: Word in reject.db database" which would provide the same functionality, but with partial key matches. Since the database file is read-only memory mapped, using mmap(2), and the database file closed immediately after the mmap call, the unstruc- tured/unordered database file can be appended from the output of the bsorttext(1) program, i.e., for example, constructs like: bsorttext -P example.db "this" "and" "that" > example.db are permitted, (which, for example, would add the words "this", "and", "that" to the unstructured/unordered database file, example.db, but only if the words were not already in the file.) Additionally, it is not required that the database file exist, and/or be consistent with the requirements of mmap(2) for the -P option. Specifically, the file does not have to exist, and can have a size of zero-but only for the -P option. (If the file exists, and has a length greater than zero, the file must be readable, and the last character in the file must be a '\n', however.) The program contains less than 300 lines of declarations and state- ments, all of which are documented with in line comments. The program has been compiled and tested on SunOS, Solaris, and Linux, and may work on other brands of Unix. If used for binary sorting an unordered text file of character strings, the program returns 0 if no error; else returns a unique error code greater than 1 representing the error encountered-which will, also, print an error diagnostic to stderr. If used for querying an unordered text file of character strings, the program returns 0 if no error and any of the specified strings were found in the database file, 1 if no error and no strings were found; else returns a unique error code greater than 1 representing the error encountered-which will, also, print an error diagnostic to stderr. The -r option is useful for controlling the return value under error conditions-for example, the program return can be preempted if the database file can not be opened, (or read,) with a return value of match, or no match, depending on environmental requirements. OPTIONS filename File name. string(s) Character string(s) to be searched for, (defaults to stdin). -I Case sensitive alphabet. -N Include null records. -o Output the sorted file, filename, disable querying. -P Print the string(s) not in the database. -r m|n On file error, exit return = match for m, no match for n. -S Disable whitespace in file warning. -U Records do not have to be unique. -v Print the program's version information. WARNINGS Under buffer overflow conditions, the program makes no attempts at han- dling the situation-it just detects it, prints an error message, and exits. The program is capable of rejecting entire Class A, Class B, or Class C, IP address ranges. Discretion is advised. SEE ALSO receivedIP(1), receivedIPdb(1), receivedIPdbdedup(1), receivedIPdbrm(1), receivedIPdbusort(1), bsearchtext(1), receivedAd- dress(1), receivedTodb(1), receivedMSGIDdb(1), receivedUnknowndb(1), tolower(1), toupper(1), bsorttext(1) receivedIPforgedb(1), hsearch- text(1), bsearchbody(1) DIAGNOSTICS Error messages for incompatible arguments, failure to allocate memory, inaccessible files, opening and closing files. AUTHORS ---------------------------------------------------------------------- A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. Copyright (c) 2001-2007, John Conover, All Rights Reserved. Comments and/or bug reports should be addressed to: john@email.johncon.com (John Conover) ---------------------------------------------------------------------- January 16, 2007 BSORTTEXT(1)