NtropiX_medium.jpg

 

Software For Algorithmic Trading Of Equities:

Architecture


Home | Installation | Usage | FAQs | Utilities | Architecture | QA | Tests | Links | Mailing List | License | Author | Download | Thanks



home.jpg
installation.jpg
usage.jpg
FAQs.jpg
utilities.jpg
architecture.jpg
QA.jpg
tests.jpg
links.jpg
mailinglist.jpg
license.jpg
author.jpg
download.jpg
thanks.jpg
  1. Data architecture:

    1. Each equity has a data structure, of type HASH, that contains the statistical information on the fluctuations in the equity's value. The structure is maintained in a hash table, referenced by the equity's name. (The elements HASH *previous and HASH *next are used for the maintenence of the hash lookup table, and the element char *hash_data references the equity's name.)

      1. Additionally, each HASH structure has elements for two linked list constructs:

        1. A singly linked list of all HASH structures, ie., a list of all equities. This list is referenced by the global HASH *decision_list. This list is constructed using the element HASH *next_decision, and is used for two purposes:

          1. It is sorted, by a linked list quick sort function, static void qsortlist (), to order the list into decreasing desirability of the equities, based on the value of the double decision element in the HASH structure, (which is set by the static int decisions () function, using the data compiled by the static int statistics () function.)

          2. To provide an access means to each equity's HASH structure for update at the end of a time interval, (ie., "spin" this list to update the statistics for each equity with the data collected in the last time interval.)

        2. A singly linked list of the HASH structures that are currently invested in, using the element HASH *next_investment. This list is referenced by the global HASH *invested_list, and is set up in the static void invest () function.)

  2. Data architecture manipulation functions:

    1. The hash lookup operations are performed by the functions, static int hash_init (), static int hash_insert (), and static HASH *hash_find ().

    2. The HASH data structures for the equities are ordered, in descending order of desirability, by the function static void qsortlist (), which is called at the beginning of the static void invest () function, prior to making investments for the next time interval.

    3. The function static int statistics () is used to calculate the "running" statistics for each equity, (ie., the rms, avg, Shannon probability, etc.,) using the data from each equity's HASH structure. There are three functions used in these calculations to perform statistical estimation:

      1. static double confidencerms ().

      2. static double confidenceavg ().

      3. static double confidenceavgrms ().

    4. The function static int decisions () is used to decide the desirability of an equity, using the data generated by the function static int statistics ().

    5. The function static void invest () is used to analyze/optimize the investment in the equities, using the data contained from each equity's HASH structure.

    6. The function static int invest_decisions () is used by the function static void invest () to assemble the portfolio.

    7. The function static void printstocks () is used to print the equities invested in for the next time interval.

    8. The static int statistics (), static int decisions (), and then the static void invest (), which calls static int invest_decisions (), and then static void printstocks (), functions are called at the conclusion of a time interval to update the statistics of all equities, then to invest in the equities with the most desirability, based on the statistics.

  3. Program description:

    1. The function main serves to read the input file, and dispatch to appropriate data handling functions, in the following order:

      1. Handle any command line arguments.

      2. Open the input file.

      3. For each record in the input file:

        1. Parse the record using the function static int strtoken (), checking that the record has exactly 3 fields, and if it does, then check that the equity's value represented by this record is greater than zero. (Note: many of the data handling functions will exhibit numerical exceptions with data values of zero, or less-this is the only protection from numerical exceptions in the program.)

        2. Lookup the equity's HASH structure represented by the record using the function static HASH *get_stock (). (The function get_stock () will return the structure if it exists, or create one if it doesn't.)

        3. Compare the time stamp of the record with the time stamp of the previous record, and if it is different, call the function static int update_stocks () to update the statistics for all equities for the last time interval, which in turn, calls the function static int statistics (), followed by the function static int decisions (), to calculate the statistics for all equities by "spinning" through the list of all equities, (using the HASH element HASH *next_decision.) After the statistics for all equities has been updated, call the function static void invest (), to make the investments for the next time interval. Note: the time stamps of the records are not used, and have no lexical or order meaning to the program-only that they must change to signify that a time interval has ended. They may be different for each record, which would imply real time "ticker" operation.

        4. Save the data contained in the record in the equity's HASH structure.

      4. At EOF of the input file, repeat III)A)3)c) for the last time interval, and close the input file. Note: if it is desired to implement a "dump and restart" mechanism, (ie., use this program to maintain a database of market statistics,) this is the appropriate place to insert the code. The data structure size for this program for an entire market is modest, and the HASH structures can be dumped, and reloaded when the program re-starts-a significant advantage over maintaining historical market data.

  4. Comments on various functions used in the program:

    1. The function static int main () is used only to read data from the input file, into the individual HASH structures for each equity, and call update functions when a time change is detected in the input file.

    2. The function static void invest () is probably the most modified function in the program-it is where the investment strategy, and portfolio management occurs. The singly linked list, maintained by the HASH element HASH *next_investment, is the list of equities invested in at any time. At the beginning of this function, the list is read, returning all investments into capital, (ie., making the list null, which means no investments.) The list of all equities is then sorted, using the static void qsortlist () function, into descending order of desirability of equities, and a new list constructed of equities that are invested in, and the process repeated to EOF of the input file.

    3. The next most modified function in the program is static int statistics (), where the statistics for each equity, for each time interval, is calculated. This function maintains a running average and root mean square of the normalized increments for each equity's HASH structure. Depending on the method used to calculate the Shannon probability, (one of enum decision_method,) one of three functions will be called from a switch construct to compute the statistical estimation information used by the function static void invest (). (The statistical estimation functions are, static double confidencerms (), static double confidenceavg (), and static double confidenceavgrms ()-there is a forth function called the first time by any of these three functions that sets up the data table structure used by the statistical estimation algorithms. See the sources for details.) Note that there are only two numerical exceptions in the statistical methodology, avg being negative, or rms being zero, (which can happen on very small data sets.) These are detected, and non-harmful data returned as a method of numerical exception handling. Note that it is not a requirement that the statistics of the history of the entire market be maintained by this program. A window approach would also be permissable, perhaps implemented with a fixed length circular buffer to calculate the "moving" root mean square and average of the normalized increments of each equity. Such a scheme may have advantages in exploiting the dynamics of the market.

    4. The function static int update_stocks () is called at the end of every time interval, to update the statistics for each and every equity. This is significantly faster than updating the data directly as input from the input file. The "index," ie., value of the aggregate market of all equities, is calculated in this function. This architecture has the following advantages:

      1. It is permissable for a single equity to have multiple updates from the input file in any time interval-something that shouldn't be a requirement, but frequently happens on real time "tickers." Note that the statistics for the equity will be calculated for such a scenario only at the end of a legitimate time interval, using the last, ie., latest, values from the input file.

      2. It is permissable for equities not to be represented in a time interval, since, under such a scenario, the equities statistics will be calculated anyway, with a no-change in equity value, ie., the statistical information for the equity will remain valid, in relation to the other equities in the market.

    5. The function int strtoken () parses input records into fields. Field delimiters are a sequence of one or more of the white space characters as defined by #define TOKEN_SEPARATORS. Comment records are discerned in static int main () as a record that begins with a '#' character.

    6. The defines #define PUSHDECISION(x), #define PUSHINVESTMENT(y), and #define POPINVESTMENT(), are used for formilization of list, or stack operations on the linked lists described in I)A)1), above, for robustness considerations.

    7. The defines typedef HASH LIST, typedef LIST *list, and #define element_comp(x,y) are used by the quick sort function, static void qsortlist (), which is described in the sources. Its use in the program is to sort the linked list, as described in I)A)1)a)i), above.

    8. The #define SIGMAS, #define STEPS_PER_SIGMA, and #define MIDWAY(a,b), are used by the statistical estimation computation functions, and define the number of sigma accuracy, and steps per sigma, respectively. These functions perform an iterated search for solution to the statistical estimation problem, as described in the Section entitled DATA SET SIZE CONSIDERATIONS, above, and II)C).

    9. The functions static int hash_init (), static int hash_insert (), static HASH *hash_find (), static void hash_term (), static int hash_text (), static int text_cmphash (), static HASH *text_mkhash (), and static void text_rmhash (), make up the hash lookup table system, which are described in the sources. Each HASH structure is initialized in the function static HASH *text_mkhash ().

    10. The function static int tsgetopt () is the same as the standard unix getopt (1). The reason for including it in the sources was to resolve portability issues, as to where the various variables are declared in the "dot h" files. In some systems, it is declared in unistd.h, in others, getopt.h, which would require finding the declarations prior to compile time, and altering the sources accordingly. Since the function is small, it is included to avoid having to search the system in a configure/setup phase.

  5. Notes and asides:

    1. The program flow follows the derivation, and many of the computational formulas were transcribed from the text. Although this may enhance clarity, it is probably not in the best interest of expeditious computation.

    2. The programming stylistics used were to encourage modifications to the program without an in depth understanding of programming. Specifically, although the program is capable of operating real time on the US equity market tickers, if efficiency is an issue, using indirect referencing on doubles that are passed as arguments to functions, and implementing implicit addresses of arrays with pointers would be recommended.


A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced.

THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY.

So there.

Copyright © 1994-2011, John Conover, All Rights Reserved.


Comments and/or bug reports should be addressed to:

john@email.johncon.com

http://www.johncon.com/
http://www.johncon.com/ntropix/
http://www.johncon.com/ndustrix/
http://www.johncon.com/nformatix/
http://www.johncon.com/ndex/
John Conover
john@email.johncon.com
January 6, 2006



Home | Installation | Usage | FAQs | Utilities | Architecture | QA | Tests | Links | Mailing List | License | Author | Download | Thanks


Copyright © 1994-2011 John Conover, john@email.johncon.com. All Rights Reserved.
Last modified: Sun Apr 24 00:00:10 PDT 2011 $Id: architecture.html,v 1.0 2011/04/24 07:02:19 conover Exp $
Valid HTML 4.0!