Data architecture:
Each equity has a data structure, of type HASH,
that contains the statistical information on the
fluctuations in the equity's value. The structure is
maintained in a hash table, referenced by the equity's
name. (The elements HASH *previous and HASH *next are used
for the maintenence of the hash lookup table, and the
element char *hash_data references the equity's name.)
Additionally, each HASH structure has elements
for two linked list constructs:
A singly linked list of all HASH structures,
ie., a list of all equities. This list is referenced
by the global HASH *decision_list. This list is
constructed using the element HASH *next_decision, and
is used for two purposes:
It is sorted, by a linked list quick sort
function, static void qsortlist (), to order the
list into decreasing desirability of the equities,
based on the value of the double decision element in
the HASH structure, (which is set by the static int
decisions () function, using the data compiled by
the static int statistics () function.)
To provide an access means to each equity's
HASH structure for update at the end of a time
interval, (ie., "spin" this list to update the
statistics for each equity with the data collected
in the last time interval.)
A singly linked list of the HASH structures
that are currently invested in, using the element HASH
*next_investment. This list is referenced by the
global HASH *invested_list, and is set up in the
static void invest () function.)
Data architecture manipulation
functions:
The hash lookup operations are performed by the
functions, static int hash_init (), static int hash_insert
(), and static HASH *hash_find ().
The HASH data structures for the equities are
ordered, in descending order of desirability, by the
function static void qsortlist (), which is called at the
beginning of the static void invest () function, prior to
making investments for the next time interval.
The function static int statistics () is used to
calculate the "running" statistics for each equity, (ie.,
the rms, avg, Shannon probability, etc.,) using the data
from each equity's HASH structure. There are three
functions used in these calculations to perform
statistical estimation:
static double confidencerms ().
static double confidenceavg ().
static double confidenceavgrms ().
The function static int decisions () is used to
decide the desirability of an equity, using the data
generated by the function static int statistics
().
The function static void invest () is used to
analyze/optimize the investment in the equities, using the
data contained from each equity's HASH structure.
The function static int invest_decisions () is used
by the function static void invest () to assemble the
portfolio.
The function static void printstocks () is used to
print the equities invested in for the next time
interval.
The static int statistics (), static int decisions
(), and then the static void invest (), which calls static
int invest_decisions (), and then static void printstocks
(), functions are called at the conclusion of a time
interval to update the statistics of all equities, then to
invest in the equities with the most desirability, based
on the statistics.
Program description:
The function main serves to read the input file,
and dispatch to appropriate data handling functions, in
the following order:
Handle any command line arguments.
Open the input file.
For each record in the input file:
Parse the record using the function static int
strtoken (), checking that the record has exactly 3
fields, and if it does, then check that the equity's
value represented by this record is greater than
zero. (Note: many of the data handling functions will
exhibit numerical exceptions with data values of zero,
or lessthis is the only protection from numerical
exceptions in the program.)
Lookup the equity's HASH structure represented
by the record using the function static HASH
*get_stock (). (The function get_stock () will return
the structure if it exists, or create one if it
doesn't.)
Compare the time stamp of the record with the
time stamp of the previous record, and if it is
different, call the function static int update_stocks
() to update the statistics for all equities for the
last time interval, which in turn, calls the function
static int statistics (), followed by the function
static int decisions (), to calculate the statistics
for all equities by "spinning" through the list of all
equities, (using the HASH element HASH
*next_decision.) After the statistics for all
equities has been updated, call the function static
void invest (), to make the investments for the next
time interval. Note: the time stamps of the records
are not used, and have no lexical or order meaning to
the programonly that they must change to signify that
a time interval has ended. They may be different for
each record, which would imply real time "ticker"
operation.
Save the data contained in the record in the
equity's HASH structure.
At EOF of the input file, repeat III)A)3)c) for
the last time interval, and close the input file. Note:
if it is desired to implement a "dump and restart"
mechanism, (ie., use this program to maintain a database
of market statistics,) this is the appropriate place to
insert the code. The data structure size for this
program for an entire market is modest, and the HASH
structures can be dumped, and reloaded when the program
restartsa significant advantage over maintaining
historical market data.
Comments on various functions used in the
program:
The function static int main () is used only to
read data from the input file, into the individual HASH
structures for each equity, and call update functions when
a time change is detected in the input file.
The function static void invest () is probably the
most modified function in the programit is where the
investment strategy, and portfolio management occurs. The
singly linked list, maintained by the HASH element HASH
*next_investment, is the list of equities invested in at
any time. At the beginning of this function, the list is
read, returning all investments into capital, (ie., making
the list null, which means no investments.) The list of
all equities is then sorted, using the static void
qsortlist () function, into descending order of
desirability of equities, and a new list constructed of
equities that are invested in, and the process repeated to
EOF of the input file.
The next most modified function in the program is
static int statistics (), where the statistics for each
equity, for each time interval, is calculated. This
function maintains a running average and root mean square
of the normalized increments for each equity's HASH
structure. Depending on the method used to calculate the
Shannon probability, (one of enum decision_method,) one of
three functions will be called from a switch construct to
compute the statistical estimation information used by the
function static void invest (). (The statistical
estimation functions are, static double confidencerms (),
static double confidenceavg (), and static double
confidenceavgrms ()there is a forth function called the
first time by any of these three functions that sets up
the data table structure used by the statistical
estimation algorithms. See the sources for details.) Note
that there are only two numerical exceptions in the
statistical methodology, avg being negative, or rms being
zero, (which can happen on very small data sets.) These
are detected, and nonharmful data returned as a method of
numerical exception handling. Note that it is not a
requirement that the statistics of the history of the
entire market be maintained by this program. A window
approach would also be permissable, perhaps implemented
with a fixed length circular buffer to calculate the
"moving" root mean square and average of the normalized
increments of each equity. Such a scheme may have
advantages in exploiting the dynamics of the
market.
The function static int update_stocks () is called
at the end of every time interval, to update the
statistics for each and every equity. This is
significantly faster than updating the data directly as
input from the input file. The "index," ie., value of the
aggregate market of all equities, is calculated in this
function. This architecture has the following
advantages:
It is permissable for a single equity to have
multiple updates from the input file in any time
intervalsomething that shouldn't be a requirement, but
frequently happens on real time "tickers." Note that the
statistics for the equity will be calculated for such a
scenario only at the end of a legitimate time interval,
using the last, ie., latest, values from the input
file.
It is permissable for equities not to be
represented in a time interval, since, under such a
scenario, the equities statistics will be calculated
anyway, with a nochange in equity value, ie., the
statistical information for the equity will remain
valid, in relation to the other equities in the
market.
The function int strtoken () parses input records
into fields. Field delimiters are a sequence of one or
more of the white space characters as defined by #define
TOKEN_SEPARATORS. Comment records are discerned in static
int main () as a record that begins with a '#'
character.
The defines #define PUSHDECISION(x), #define
PUSHINVESTMENT(y), and #define POPINVESTMENT(), are used
for formilization of list, or stack operations on the
linked lists described in I)A)1), above, for robustness
considerations.
The defines typedef HASH LIST, typedef LIST *list,
and #define element_comp(x,y) are used by the quick sort
function, static void qsortlist (), which is described in
the sources. Its use in the program is to sort the linked
list, as described in I)A)1)a)i), above.
The #define SIGMAS, #define STEPS_PER_SIGMA, and
#define MIDWAY(a,b), are used by the statistical
estimation computation functions, and define the number of
sigma accuracy, and steps per sigma, respectively. These
functions perform an iterated search for solution to the
statistical estimation problem, as described in the
Section entitled DATA SET SIZE CONSIDERATIONS, above, and
II)C).
The functions static int hash_init (), static int
hash_insert (), static HASH *hash_find (), static void
hash_term (), static int hash_text (), static int
text_cmphash (), static HASH *text_mkhash (), and static
void text_rmhash (), make up the hash lookup table system,
which are described in the sources. Each HASH structure is
initialized in the function static HASH *text_mkhash
().
The function static int tsgetopt () is the same as
the standard unix getopt (1). The reason for including it
in the sources was to resolve portability issues, as to
where the various variables are declared in the "dot h"
files. In some systems, it is declared in unistd.h, in
others, getopt.h, which would require finding the
declarations prior to compile time, and altering the
sources accordingly. Since the function is small, it is
included to avoid having to search the system in a
configure/setup phase.
Notes and asides:
The program flow follows the derivation, and many
of the computational formulas were transcribed from the
text. Although this may enhance clarity, it is probably
not in the best interest of expeditious
computation.
The programming stylistics used were to encourage
modifications to the program without an in depth
understanding of programming. Specifically, although the
program is capable of operating real time on the US equity
market tickers, if efficiency is an issue, using indirect
referencing on doubles that are passed as arguments to
functions, and implementing implicit addresses of arrays
with pointers would be recommended.
A license is hereby granted to reproduce this software
source code and to create executable versions from this source
code for personal, noncommercial use. The copyright notice
included with the software must be maintained in all copies
produced.
THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO
WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING
WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY
PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF
THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY
RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY.
So there.
Copyright © 19942011, John Conover, All Rights
Reserved.
Comments and/or bug reports should be addressed to:
 john@email.johncon.com
 http://www.johncon.com/
 http://www.johncon.com/ntropix/
 http://www.johncon.com/ndustrix/
 http://www.johncon.com/nformatix/
 http://www.johncon.com/ndex/
 John Conover
 john@email.johncon.com
 January 6, 2006
