# TeX/LaTeX in relevance searching

From: John Conover <john@email.johncon.com>
Subject: TeX/LaTeX in relevance searching
Date: Wed, 23 Nov 94 01:27 PST


FYI, attached pls. find a copy of a correspondence between
bnb@math.ams.org and myself (sometime in 1993.) The context of the
discussion was the standardization of TeX/LaTeX to facilitate an
information retrieval (i.e., electronic literature search) system
implemented via an inverted index. I think the concept of the
relevance search schema may be relevant to this discussion, although
any discussion of the standardization of TeX/LaTex would be
inappropriate here, and should be directed elsewhere.

> If you are using a full text database information retrieval system,
> (ie., an electronic literature search system,) it is an advantage to
> be able to do relevance searches. For example, the incidence of a word
> found in a \section{...} heading would be weighted higher than if the
> word was found simply in a paragraph. Note the issue here; relevance
> information can be obtained from the way the author structured the
> document.
>
> To build such an information retrieval system (presumably distributed
> over a heterogeneous network,) I need the syntax to a document
> structure standard.  Quite probably, LaTeX comes closest to meeting
> the requirements. SGML also overlaps into this area, and has
> significant inertia in the market place (particularly, Europe) since
> it is, arguably, an international standard.  (Yes, I understand that
> TeX is a typesetting language-but the LaTeX macros extend this
> capability into the document structure area.)
>
> Not too many systems will allow you to query for the contents of a
> table, citations, figure captions, etc. (Or for "where was that
> integral," ie., query for \int ...) Could possibly use the Unix MTA as
> a carrier, also.

