Quantitative Analysis of Non-Linear High Entropy Economic Systems VII

From: John Conover <john@email.johncon.com>
Subject: Quantitative Analysis of Non-Linear High Entropy Economic Systems VII
Date: 28 Aug 2006 09:39:47 -0000


Introduction

As mentioned in Section I, Section II, Section III, Section IV, Section V and Section VI, much of applied economics has to address non-linear high entropy systems-those systems characterized by random fluctuations over time-such as net wealth, equity prices, gross domestic product, industrial markets, etc.


Review

A quick review of this series.

Many economic systems are characterized by non-linear high entropy time series. These time series are a geometric progression, as analyzed in Section I, and the distribution of the marginal increments of the time series exhibit log-normal distributions, as suggested in Section II. The characteristics of the marginal increments can be analyzed as suggested in Section III, and, Section IV, to formulate investment strategies and optimizations as illustrated in Section V. The finer details of the types of leptokurtosis found in the marginal increments of financial time series is analyzed in Section VI.


A concluding example, the DJIA

Revisiting the DJIA, (since it has a long historical database,) a meticulous analytical approach will be used to analyze the characteristics of the closing values of the DJIA. The analytical procedure will use a conscientious process commonly used in engineering practice:

  1. Assume a systemic model, (in this case, that the time series is a geometrical progression.)
  2. Extract/analyze the values of the variables used in the model. This will be done with a script of analytical programs, "chained" together, (usually with Unix pipes for maintainability and extensibility.)
  3. The variables will be used to simulate the characteristics of the systemic model.
  4. The empirical data and simulated data will be compared, using analytical programs, and pictographic presentations, (i.e., graphs,) to provide an intuitive interpretation of the data, and its comparison to the theoretical model-in every step in the analysis.

Note: the C source code to all programs used in the script are available from the NtropiX Utilities page, or, the NdustriX Utilities page, and is distributed under License.

The historical time series of the DJIA index was obtained from Yahoo!'s database of equity Historical Prices, (ticker symbols ^DJI,) in csv format. The csv format was converted to a Unix database, djia, using the csv2tsinvest program. (The DJIA time series started on January 2, 1900, and contained 29010 daily closes, through May 26, 2006.)

Plotting the closing values of the DJIA:


060828101013.7889-a.jpg

Figure I

Figure I is a plot of the daily closes of the DJIA, from January 2, 1900, through, May 26, 2006. The simulated value is constructed from the variables extracted from the empirical data in the script, below, as is the median value, and presented here for comparison.

The script used for the programs will be walked through statement by statement, to illustrate and validate the analytic procedure.

Starting with the first two statements, and following the outline from Section I:



    tsfraction djia | tsavg -p
    0.000236
    tsfraction djia | tsrms -p
    0.010950

From Equation (1.24), P = 0.51077625570776255708, meaning that there are, on average, about 51 up movements, and 49 down movements, out of one hundred. P is the probability of an up movement in the DJIA.

Log-normal distributions of the marginal increments of a time series-those distributions commonly found in geometric progressions-are difficult to comprehend intuitively, and it is expedient to convert the time series to its Brownian Motion, (random walk,) equivalent as outlined in Section II.

The root-mean-square, rms, of the Brownian Motion equivalent, (the next two statements in the script):



    tsfraction djia | tsmath -s 0.000236 | tsrms -p
    0.010947
    tsmath -l djia | tsderivative | tsmath -s 0.000176 | tsrms -p
    0.010998

which are alternative methods-the first extracts the rms directly from the geometric progression, and the second from its Brownian Motion equivalent. The two answers should be nearly identical. The offset, avg = 0.000236, is subtracted from the first, and ln (g) = ln (1.000176) = 0.000176 from the second. The logarithm of the rms will be useful later, ln (0.010947) = -4.51468983285971677053.

The number of elements in the time series, and its beginning value will be of interest, later:



    wc djia
    29010  29010 202761 djia
    head -1 djia
    68.13

The marginal gain, g of the Brownian Motion equivalent is determined by the next two statements in the script:



    tsgain -p djia
    1.000176
    tsmath -l djia | tsderivative | tsavg -p
    0.000176

    tslsq -e -p djia
    e^(3.450080 + 0.000172t) = 1.000172^(20062.070643 + t) = 2^(4.977413 + 0.000248t)

The two answers should be nearly equivalent. The third line in this section of the script provides yet another method-it uses the exponential Least-Squares, (LSQ,) best fit to the original time series; it, too, should provide a nearly identical answer to the to the other two methods, (0.000176 vs. 0.000172.) The LSQ best fit to the data starts with a first element value of exp (3.450080) = 31.50291244093657542517.

Using the variables produced by the LSQ best-fit, and plotting the Brownian Motion equivalent of the DJIA:


060828101013.7889-b.jpg

Figure II

Figure II is a plot of the Brownian Motion, (random walk,) equivalent of the DJIA, from January 2, 1900, through, May 26, 2006. The simulated values are constructed from the variables extracted from the empirical data in the script, below.

Having converted the DJIA's time series to its Brownian Motion equivalent, the marginal increments can be analyzed. One of issues to be addressed is leptokurtosis-specifically, the deviation from the theoretical assumption that the increments are statistically independent-this will indicate what math should be used, (if the increments are independent, then root-mean-square should be used, if not, another root-mean should be used, as per Section VI.) An iterated script will be used to find the root:



    R="0.5"
    #
    > "log"
    #
    LAST="NOTHING"
    #
    LOOP="1"
    #
    while [ "${LOOP}" -eq "1" ]
    do
        tsmath -l djia | tsderivative | tsmath -s 0.000176 | tsintegrate | \
            tsrunmagnitude -r "${R}" > "djia.magnitude"
        cut -f1 "djia.magnitude" | tsmath -l > "temp.5"
        cut -f2 "djia.magnitude" | tsmath -l > "temp.6"
        LAST=`paste temp.5 temp.6 | egrep '^[0-5]\.' | tslsq -p`
        echo "${LAST}"
        R=`echo "${LAST}" | sed -e 's/^.* //' -e 's/t.*$//'`
        #
        if grep -e "${LAST}" "log"
        then
            LOOP="0"
        fi
        #
        mv "temp.5" "temp.5.last"
        mv "temp.6" "temp.6.last"
        mv "djia.magnitude" "djia.magnitude.last"
        echo "${LAST}" >> "log"
    done

The script fragment is an iterated search-for-solution algorithm that initially assumes a root of 0.5, uses tsrunmagnitude to analyze the time series and produce a more accurate approximation to the root, and so on, until no further improvements were possible. (The other statements in the loop are standard Unix text database manipulations, using cut(1) and paste(1) to extract, and reassemble fields in the database, egrep(1) to extact only days 1 - e^5.999... = 403 days, and so forth.)

The output of the script fragment is:



    -4.592316 + 0.537435t
    -4.648576 + 0.541035t
    -4.653584 + 0.541347t
    -4.654019 + 0.541375t
    -4.654057 + 0.541377t
    -4.654058 + 0.541377t
    -4.654058 + 0.541377t

meaning that, at least in the very short term, (i.e., daily returns,) there is about a 54% chance that what happened on any one day will occur on the next day, also.

The simulation can now be constructed using the tsinvestsim program with the file, djia.sim:



    djia, p = 0.51077625570776255708, f = 0.010950, i = 31.50291244093657542517, h = 0.541377, l = 1

and running the tsinvestsim:



    tsinvestsim djia.sim 29010 | cut -f3 > sim

And, analyzing the simulation file, sim, in an identical manner to the DJIA analysis:



    tsfraction sim | tsavg -p
    0.000253
    tsfraction sim | tsrms -p
    0.010994
    tsmath -l sim > sim.ln

    tslsq -e -p sim
    e^(3.548001 + 0.000146t) = 1.000146^(24382.768809 + t) = 2^(5.118683 + 0.000210t)

Which compares favorably to the original analysis of the DJIA. The files produced in the simulation were presented in Figure I and Figure II, above, for comparison with the original DJIA time series.

The ground work is now prepared to look into issues of leptokurtosis of the DJIA. As presented in Section VI, the model used will be Laplacian distribution:



    tsfraction djia | tsmath -s 0.000236 | tsnormal -t > djia.distribution
    tsfraction djia | tsmath -s 0.000236 | tsnormal -t -f > djia.frequency
    tsfraction sim | tsmath -s 0.000236 | tsnormal -t > sim.distribution
    tsfraction sim | tsmath -s 0.000236 | tsnormal -t -f > sim.frequency

    egrep '^-' djia.frequency | wc
    50     100     950
    egrep '^-' djia.frequency | tail -49 | tslsq -e -p | sed 's/ = .*$//'
    e^(0.710298 + 147.146009t)

Here, the offset of distribution is subtracted, as above, from the marginal increments of the DJIA's value, and its simulation, and a histogram of the marginal increments made with the tsnormal program. An LSQ approximation to the distribution is necessary, and since the Laplace distribution is a double exponential, the negative side of the distribution is omitted using egrep(1), and the tslsq program used to provide the LSQ best-fit approximation to the distribution. And plotting:


060828101013.7889-c.jpg

Figure III

Figure III is a plot of the distribution of the marginal increments of the Brownian Motion, (random walk,) equivalent of the DJIA, from January 2, 1900, through, May 26, 2006, and its simulation. The Gaussian/Normal LSQ best-fit approximation is presented as a comparison, also-the variance of all distributions shown is nearly identical, as would be expected.

Integrating the count of marginal increments in each 0.1% "bucket" to obtain the cumulative probabilities:



    tsfraction djia | tsmath -s 0.000236 | sed 's/[0-9][0-9][0-9]$//' | sort -n | \
        tscount -r | tsmath -t -d 29009 | tsintegrate -t > djia.cumulative
    tsfraction sim | tsmath -s 0.000236 | sed 's/[0-9][0-9][0-9]$//' | sort -n | \
        tscount -r | tsmath -t -d 29009 | tsintegrate -t > sim.cumulative

And plotting:


060828101013.7889-d.jpg

Figure IV

Figure IV is a plot of the cumulative distribution of the marginal increments of the Brownian Motion, (random walk,) equivalent of the DJIA, from January 2, 1900, through, May 26, 2006, and its simulation. It was analyzed by a different method-its derivative should be much the same as Figure III, above, and is included as a method of cross-checking the data and analysis.

The run lengths of the expansions and contractions of the DJIA:



    tsmath -l djia | tsderivative | tsmath -s 0.000176 | tsintegrate | tsrunlength | cut -f1,7 > djia.length
    tsmath -l sim | tsderivative | tsmath -s 0.000176 | tsintegrate | tsrunlength | cut -f1,7 > sim.length

And, plotting:


060828101013.7889-e.jpg

Figure V

Figure V is a plot of the cumulative probability of the run lengths of the expansions and contractions of the Brownian Motion, (random walk,) equivalent of the DJIA, from January 2, 1900, through, May 26, 2006, and its simulation. erf (1 / sqrt (x)) is the theoretical value. As an example interpretation, there is a little over 10% chance of a the value of the DJIA being above its median value for at least 100 trading days.

And, the magnitude of the expansions and contractions of the DJIA:



    tsmath -l djia | tsderivative | tsmath -s 0.000176 | tsintegrate | tsrunmagnitude > djia.magnitude
    tsmath -l sim | tsderivative | tsmath -s 0.000176 | tsintegrate | tsrunmagnitude > sim.magnitude

And, plotting:


060828101013.7889-f.jpg

Figure VI

Figure VI is a plot of the deviation from the median value of the expansions and contractions of the Brownian Motion, (random walk,) equivalent of the DJIA, from January 2, 1900, through, May 26, 2006, and its simulation. 0.010947 * sqrt (x) is the theoretical value. As an example interpretation, there is a standard deviation chance that the value of the DJIA will be within a little more than +/- 10% of its median value at 100 trading days.

The discrepancies of the curves from the theoretical values are do to market inefficiencies. The empirical curves are steeper for small time intervals, (near 1 day,) because the market does not respond instantaneously to new information-there is a slight persistence from one day to the next. Additionally, the empirical curves are steeper than the theoretical at 253 trading days, (about a calendar year,) for structural reasons-specifically, taxation schedules that favor funds selling off losing equities before the end of the calendar year. It should be noted that deviation from the theoretical values is not constant, and varies throughout the calendar year. The LSQ best fit approximations are an average over the 403 days-about 19 months.

Market inefficiencies are exploitable, (if the DJIA were a perfect Brownian Motion random walk, the market would be fair, and no one could have an advantage over anyone else in the long run.) Delving into the market inefficiencies by making a log-log plot of Figure VI.



    cut -f1 djia.magnitude | tsmath -l > temp.1
    cut -f2 djia.magnitude | tsmath -l > temp.2
    paste temp.1 temp.2 > djia.magnitude.ln

    cut -f1 sim.magnitude | tsmath -l > temp.3
    cut -f2 sim.magnitude | tsmath -l > temp.4
    paste temp.3 temp.4 > sim.magnitude.ln

    egrep '^[0-5]\.' djia.magnitude.ln | tslsq -p
    -4.592316 + 0.537435t
    egrep '^[0-5]\.' sim.magnitude.ln | tslsq -p
    -4.471600 + 0.512268t

And, plotting:


060828101013.7889-g.jpg

Figure VII

Figure VII is a log-log plot of the deviation from the median value of the expansions and contractions of the Brownian Motion, (random walk,) equivalent of the DJIA, from January 2, 1900, through, May 26, 2006, and its simulation shown in Figure VI.

And, plotting Figure VII for short time intervals to emphasize the market inefficiency:


060828101013.7889-h.jpg

Figure VIII

Figure VIII is a log-log plot of the deviation from the median value of the expansions and contractions of the Brownian Motion, (random walk,) equivalent of the DJIA, from January 2, 1900, through, May 26, 2006, and its simulation, plotted for a few trading days.

And, plotting Figure VII around a calendar year to emphasize the market inefficiency:


060828101013.7889-i.jpg

Figure IX

Figure IX is a log-log plot of the deviation from the median value of the expansions and contractions of the Brownian Motion, (random walk,) equivalent of the DJIA, from January 2, 1900, through, May 26, 2006, and its simulation, plotted at a calendar year.

Figure VIII and Figure IX indicate exploitable market inefficiencies-where the marginal increments are not statistically independent, (iid,) meaning some sense of predictability.

To remove the statistical dependence, the marginal increments of the Brownian Motion, (random walk,) equivalent of the DJIA can be moved randomly, (i.e., scrambled,) in the time series, and the random walk equivalent of the time series re-assembled, then the deviation from the median value of the expansions and contractions analyzed:


    #
    tsmath -l djia | tsderivative | tssequence | sort -n | cut -f3 | \
        tsmath -s 0.000176 | tsintegrate > "scrambled"
    #
    R="0.5"
    #
    > "log"
    #
    LAST="NOTHING"
    #
    LOOP="1"
    #
    while [ "${LOOP}" -eq "1" ]
    do
        tsrunmagnitude -r "${R}" "scrambled" > "scrambled.magnitude"
        cut -f1 "scrambled.magnitude" | tsmath -l > "temp.7"
        cut -f2 "scrambled.magnitude" | tsmath -l > "temp.8"
        LAST=`paste temp.7 temp.8 | egrep '^[0-5]\.' | tslsq -p`
        echo "${LAST}"
        R=`echo "${LAST}" | sed -e 's/^.* //' -e 's/t.*$//'`
        #
        if grep -e "${LAST}" "log"
        then
            LOOP="0"
        fi
        #
        mv "temp.7" "temp.7.last"
        mv "temp.8" "temp.8.last"
        mv "scrambled.magnitude" "scrambled.magnitude.last"
        echo "${LAST}" >> "log"
    done

The output of the script fragment is:



    -4.498716 + 0.496851t
    -4.495234 + 0.496649t
    -4.495005 + 0.496635t
    -4.494994 + 0.496635t
    -4.494994 + 0.496635t
    -4.494994 + 0.496635t

And, plotting:


060828101013.7889-j.jpg

Figure X

Figure X is a plot of the deviation from the median value of the expansions and contractions of the scrambled Brownian Motion, (random walk,) equivalent of the DJIA, from January 2, 1900, through, May 26, 2006, and its simulation. 0.010947 * sqrt (x) is the theoretical value. Note that comparing with Figure VI, the deviation is, within numerical precision, very close the the theoretical value.

The distribution of the marginal increments of the scrambled Brownian Motion, (random walk,) equivalent of the DJIA are the same as shown in Figure III, above, since they are the same increments.


The annual market inefficiencies would be difficult to exploit, (they only happen once a year,) except as a defensive strategy. However, the short term inefficiencies do offer an opportunity. Rerunning the script with an LSQ of only a few days:



    R="0.5"
    #
    > "log"
    #
    LAST="NOTHING"
    #
    LOOP="1"
    #
    while [ "${LOOP}" -eq "1" ]
    do
        tsmath -l djia | tsderivative | tsmath -s 0.000176 | tsintegrate | \
            tsrunmagnitude -r "${R}" > "djia.magnitude"
        cut -f1 "djia.magnitude" | tsmath -l > "temp.9"
        cut -f2 "djia.magnitude" | tsmath -l > "temp.10"
        LAST=`paste temp.9 temp.10 | egrep '^[0]\.' | tslsq -p`
        echo "${LAST}"
        R=`echo "${LAST}" | sed -e 's/^.* //' -e 's/t.*$//'`
        #
        if grep -e "${LAST}" "log"
        then
            LOOP="0"
        fi
        #
        mv "temp.9" "temp.9.last"
        mv "temp.10" "temp.10.last"
        mv "djia.magnitude" "djia.magnitude.last"
        echo "${LAST}" >> "log"
    done

The output of the script fragment is:



    -4.510133 + 0.518242t
    -4.539663 + 0.521663t
    -4.545016 + 0.522309t
    -4.546053 + 0.522481t
    -4.546241 + 0.522468t
    -4.546241 + 0.522468t
    -4.546241 + 0.522468t

Meaning that there is a little over a 2% chance that what happened in the DJIA on any given day will happen on the next day, also.

This analysis was originally used to design the algorithm used in the -d5 option to the tsinvest program. Checking:



    sed 's/^/DJIA      /' djia | tsnumber | tsinvest -r | tail -1
    # DJIA, p = 0.510810, f = 0.010949, h = 0.544745, i = 68.130000

    tsinvestsim djia.sim 29010 | tsinvest -r | tail -1
    # DJIA, p = 0.511489, f = 0.010994, h = 0.548321, i = 31.787033

Numbers which agree very favorably with this analysis. And, running the program on the DJIA time series, from January 2, 1900, through, May 26, 2006:



    sed 's/^/DJIA    /' djia | tsnumber | tsinvest -its -d5 | egrep DJIA | cut -f3 | tsgain -p
    1.000535

The theoretical gain, g, per trading day would be, (from: Equation (1.20)):



    rms = e^(-4.546241) = 0.0106070013
    P = 0.522468
    g = ((1 + 0.0106070013)^0.522468) * ((1 - 0.0106070013)^(1 - 0.522468))
    g = 1.0004204851

The reason the measured daily gain, g, is larger than the theoretical value is the sophistication of the algorithm used in the tsinvest program-it maintains two different tables, (one probability density function for positive movements, another for negative,) and calculates the probabilities of future movements using the empirically derived probability density functions, (as opposed to the LSQ approximation of daily returns for a year used in this analysis.) But the theoretical and empirical values are reasonably close.

Compare these values with the gain of the DJIA, from January 2, 1900, through, May 26, 2006:



    tsgain -p djia
    1.000176

Which would be the long term investment potential of the DJIA, (from Equation (1.24)):



    avg = 0.000236
    rms = 0.010950

    P = ((0.000236 / 0.010950) + 1) / 2 = 0.51077626

    g = ((1 + 0.010950)^0.51077626) * ((1 - 0.010950)^(1 - 0.51077626))
    g = 1.0001760701

The difference in annual gain is significant. Exploiting short term market inefficiencies resulted in an annual gain, (of 253 trading days,) of 1.000535^253 = 1.1449017271, or a little less than 15% per year. Compared with 1.000176^253 = 1.0455301549, or a little less than 5% per year as a long term investment.

There are other engineered solutions for increasing the value of investments in the DJIA equities, too-as explained in Quantitative Analysis of Non-Linear High Entropy Economic Systems V-specifically, see a simulation of the strategy, which yielded a little over a 17% annual growth in value over the last quarter of the Twentieth Century.

It is interesting to note that, in the long run, a well executed long term portfolio strategy-specfically, rebalancing expeditiously-is more important than timing the market, (which is what this analysis was about,) which, in turn, is more important than picking winners.

A well designed strategy does all three, but in that order of priority.

As a side bar, this is the intended usage of the tsinvest program-it is, essentially, an automated broker. It assembles and maintains a portfolio from a universe of equities, (thousands are common,) according to a strategy, as defined on the command line. The universe of equities is typically a stock ticker, (but can be an historical database of the ticker for research.) The program is typically used on equities, but is not restricted to equities alone-different investments can be mixed and matched, (for example, optimizing the balance of money between a saving account and a portfolio of equities, and/or properties, dynamically.)

A word of caution, however. The program is a tool, and a tool is no better than the mechanic using it. It is not a substitute for due diligence and meticulous research.

It would probably be better to view the program as a search mechanism for investments-like a Google of the ticker, where one searches for equities/investments that fit a search criteria, (i.e., an investment strategy.) It is a tool for extending the depth and breadth, (and speed,) of investing.



A note about the DJIA time series:

All of these represent anomalies effecting the accuracy of the analysis.

The time series of the DJIA contained 29010 daily closes, (29009 increments.) The margin of error, (using statistical estimation,) would be 0.010950 / sqrt (29009) = 0.0000642906, meaning that there is a 95% probability, (i.e., two double sided standard deviations,) that the deviation of the increments is more than 0.010950 - 0.0000642906 = 0.0108857094, and, less than 0.010950 + 0.0000642906 = 0.011014291, which is about +/- 0.6%. There is, also, a 95% probability that the average of the increments is more than 0.000236 - 0.0000642906 = 0.0001717094, and, less than 0.000236 + 0.0000642906 = 0.0003002906, which is a little more than +/- 27%, which could be a source of significant error in the analysis-the average of the increments can only be known to within a factor of about 2, with a 95% confidence level. (Note that this uncertainty can be addressed by modifying P in Equation (1.24) appropriately to accommodate data set size issues. This is how the tsinvest program avoids "chasing bubbles"; its just another uncertainty that the program has to address).

The distribution of the increments of the Brownian Motion, (random walk,) equivalent of the DJIA, (see Figure III,) holds reasonably well through 3 deviations. The Laplacian distribution used has PDF, (probablity distribution function,) of e^(x / 0.00679597093) giving a variance of 2 * 0.00679597093)^2, or a deviation of 0.00961095426.

The cumulative tail counts would be, (and the actual counts, see Figure IV):

Note that there is more high order kurtosis than can be explained by the model used. (There are several conjectures: LSQ methodology was used extensively, and with the center of the distribution missing from the data-the most populous data segment-the LSQ approximation could be skewed; there are Levy stable characteristics in the distribution-but the deviation of the increments seems stable, which would be contradictory; there is white noise added to the distribution, possibly created by data collection issues-much of the Twentieth Century collection was done manually-or market overload anomalies created by matching bid/ask failures; yet another conjecture is the assumption, in the model, of a uniform distribution of interday trades.) With so few discrepant data points in the tails, it difficult to make a reliable assessment.

As a side bar, note that, for example, the chances of at least a 5 deviation, (i.e., greater than a 5 sigma hit,) in the Brownian Motion, (random walk,) equivalent of the DJIA is 0.000000286651571558 using a Gaussian/normal paradigm of the PDF of the increments, (about 1 in 3,488,556 trading days, or about once in 13,789 calendar years of 253 trading days per calendar year-about the duration, so far, of civilization, itself.) The model used predicts a much greater frequency, about 29009 / 12.32 = 2354.63, or about once every 9.3 years. In reality, they have occurred about once every 29009 / 80 = 363 trading days, or about once every year, five months, (based on the historical perspective of the Twentieth Century.)

The Gaussian/normal paradigm is very inappropriate for assessing the risk frequencies of catastrophic events in financial time series-not to mention that high risk daily closes tend to cluster together, (which is what this analysis was about-they are not iid, i.e., statistically independent.) Not to mention that the clusters tend to be synchronous/causal with annual structural phenomena, too.

The issue is that any mathematical abstraction should be approached carefully and used with caution-this analysis provides a mathematical model/abstraction of bubbles in financial markets, (look at the graphs, above-that is what they are all about,) which is relatively good. But that does not mean caution is inappropriate.



Appendix I, Example of the Ubiquity of Non-Linear High Entropy Economic Systems

To illustrate the ubiquity of time series with geometric progressions, Laplacian distributed increments, and, log-normal evolution, web server page hits will be analyzed-this domain, www.johncon.com, will provide the example. It is not intuitively obvious that server page hits would have these characteristics until it is considered:

  1. For hits to increase over time, the site must be known-and to be known, it has to be bookmarked, (or found by a search engine, or introduced in a mailing list, etc.,) which would lead to more bookmarks, and so on. The probability of a bookmark leading to yet another book mark would remain much the same over time, and if the average probability is greater than unity, the number of hits per day will follow an increasing geometric progression; but there will be significant random variation from day to day, leading to a log-normal evolution over time.

  2. The probability of a hit during any time interval during the day would be approximately constant, leading to Laplacian distributed increments in the time series of web server hits per day.

Finding the median value of page hits per day:



    tslsq -e -p "hits"
    e^(4.948240 + 0.001039t) = 1.001040^(4761.533341 + t) = 2^(7.138801 + 0.001499t

And plotting:


060828101013.7889-k.jpg

Figure XI is a plot of the web server hits per day for domain www.johncon.com, from December 27, 1999, through, January 2, 2007, and its median value, determined by exponential LSQ best fit. (The hits were filtered to exclude crawlers and information robots.)

And, analyzing the increments of the server hits:



    tsmath -l "hits" | tslsq -o | tsderivative | tsnormal -t > "hits.distribution"
    tsmath -l "hits" | tslsq -o | tsderivative | tsnormal -f -t > "hits.frequency"

And plotting:


060828101013.7889-l.jpg

Figure XII is a plot of the distribution of the marginal increments of the Brownian Motion, (random walk,) equivalent of the web server hits per day for domain www.johncon.com, from December 27, 1999, through, January 2, 2007, which should be compared with Figure III, above.

Note the implications of the analysis:


--

John Conover, john@email.johncon.com, http://www.johncon.com/


Copyright © 2002-2007 John Conover, john@email.johncon.com. All Rights Reserved.
Last modified: Mon Jul 29 12:29:27 PDT 2002 $Id: 060828101013.7889.html,v 1.0 2007/02/02 06:06:04 conover Exp $
Valid HTML 4.0!