PDICT(1) PDICT(1) NAME pdict - Phonetic word lookup in the system dictionary SYNOPSIS pdict [-d path] [-v] word DESCRIPTION The pdict program uses the soundex algorithm to search a text version of a standard English dictionary for words that sound like a word sup- plied on the command line. The format of the dictionary is one word per line. Each line must be EOL terminated, ie., with a '0. The word on the command line is converted to its soundex representa- tion, then sequentially compared to the soundex representation of each word in the dictionary. If the two compare, the dictionary word is printed to the standard output. The soundex algorithm is a mechanical phonetic translation system for the English language, and converts English words into a corresponding phonetic code for the word. The algorithm is as follows: for each character in a word: if the character is the first character of a word 1) do nothing else 2) replace consecutive sequences of the labials, (ie., the characters, B, F, P, V,) with the character '1' 3) replace consecutive sequences of the gutterals and sibilants, (ie., the characters, C, G, J, K, Q, S, X, Z,) with the character '2' 4) replace consecutive sequences of the dentals, (ie., the characters, D, T,) with the character '3' 5) replace consecutive sequences of the longliquids, (ie., the character, L,) with the character '4' 6) replace consecutive sequences of the nasals, (ie., the characters, M, N,) with the character '5' 7) replace consecutive sequences of the shortliquids, (ie., the character, R,) with the character '6' 8) and, omit all other characters, (ie., the characters, A, E, H, I, O, U, W, Y,) 9) if the soundex translation of the word is larger than 4 characters, truncate to 4 characters. For example, the soundex translation of the word "conover" is C516. Extended word length searches can be enhanced by eliminating 9), above. Running the above algorithm on a standard text version of the Webster's English dictionary, (mine has 234,932 words,) results in 61,408 differ- ent words being recognized by the algorithm. OPTIONS -d path Dictionary file. -v Print the version and copyright banner of the program. word Word to lookup. WARNINGS The program only works for the English language. The program scans the system dictionary with each invocation. A more expedient approach would be to translate the system dictionary into soundex, and search the resulting soundex dictionary-perhaps with a binary search. There is a test function, testtranslit, supplied with the sources that will trans- late the system dictionary. See the Makefile for the pdict program. SEE ALSO egrep(1), agrep(1) DIAGNOSTICS Error messages for illegal or incompatible command line arguments, missing or inaccessible files and directories. AUTHORS ---------------------------------------------------------------------- A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. Copyright (c) 1995-2005 John Conover, All Rights Reserved. Comments and/or bug reports should be addressed to: john@johncon.com (John Conover) ---------------------------------------------------------------------- January, 1, 2005 PDICT(1)