Re: entropy spreadsheet

From: John Conover <>
Subject: Re: entropy spreadsheet
Date: 18 Dec 2000 07:43:44 -0000

Hi Jeff. Good question.

The answer is that P, and thus G, have fractal characteristics, and
measuring them has to take this issue into account.

Any time one runs a metric on a fractal system, data set size is an
important consideration-its not so much having a lot of data as it is
data over a long enough time, (although the concept of self-similarity
is how data over short time intervals and long time intervals are
related-an extrapolation that is commonly used.)

That's what the tsshannoneffective program is all about-it defines the
minimum time interval for measuring a financial variable;
additionally, it can make a quantitative statement about the risk of
using a shorter interval, (the tsshannoneffective program is a
cut-and-stick from the tsinvest sources, BTW, where their usage is
controlled by the -c, and -C options.)

In some sense, it is kind of a statistical estimation technique,
(which is actually the default used in tsinvest-and can be disabled
with the -c argument,) and a similar method which deals with run
lengths of "bubbles," (which can be enabled with the -C argument.)

Both methods use the error function, and I'll give an example of how
it works-the run lengths of bull or bear times have a chance of
continuing past n many days of erf (1 / sqrt (n)), which for n >> 1,
is about 1 / sqrt (n).

What this means is that if a bull, (or bear,) market has run 15 days,
the expectations of it continuing at least one more day is about 25%.
For 24 days, about 20%, and so on.

Fractals are made up of "bubbles", (at all scales, too-it works for
minutes, days, years, decades, etc.; kind of "bubbles" made up of
"minibubbles", which in turn are made up of "microbubbles," and so
on,") with these kinds of statistics, so one has to be concerned-as
you are judging by the question you ask-about making a measurement of
P, and, by serendipity, the measurement being misleading since it was
made in a "bubble."

I suppose you are considering a long term investment, (i.e., using P =
((avg / rms) + 1) / 2, e.g., the -d1 option to tsinvest, and not the
"trader" arguments, -d4 and -d5,). Note that the chances of the
"bubble" continuing at 350 days is also the chance one would take by
betting on the value of P measured at 350 days, (its a subtle
concept-think of it as how many times you would loose, doing the same
"bet" in an iterated game-how would P have to modified to accommodate
the times you lost do to data set size considerations,) so, I can
multiply the two probabilities together to get a compensated, or
effective, value of P.

In other words, the value of P = 0.526, measured with a data set size
of 350, would be known only to a factor of 1 +/- 1 / sqrt (350) =
0.946547752 to 1.05345224838, or the compensated, or effective value
of, P would be between .497884117 and 0.554115885. (And, tsinvest
would not bet on that, unless over ridden with the -D option, which
requires P > 0.5, i.e., other stocks with a higher P, or a larger data
set, or both, would be more desirable.)

Note that, in some sense, it is kind of like a low-pass filter to keep
tsinvest from "betting" on things where the metrics may have been
distorted by being measured during a "bubble".

Or, from tsshannoneffective, (using avg = 0.0016, and rms = 0.04, for
a value of P = 0.52, for 350 days):

    john@john:~ 685% tsshannoneffective 0.0016 0.04 350
    For P = (sqrt (avg) + 1) / 2:
        P = 0.520000
        Peff = 0.401709
    For P = (rms + 1) / 2:
        P = 0.520000
        Peff = 0.518002
    For P = (avg / rms + 1) / 2:
        P = 0.520000
        Peff = 0.479763

and the last number is close, (about 18 parts in 500, or so,) to what
we did in our head, above.

However, note that the minimum time interval requirements for the
metrics also depends on the value of P, too-a larger value of P will
permit investing, (i.e., Peff > 0.5,) much quicker, for example, P =

    john@john:~ 690% tsshannoneffective 0.04 0.2 40
    For P = (sqrt (avg) + 1) / 2:
        P = 0.600000
        Peff = 0.527700
    For P = (rms + 1) / 2:
        P = 0.600000
        Peff = 0.579606
    For P = (avg / rms + 1) / 2:
        P = 0.600000
        Peff = 0.500024

requires a data set size of only 40 days.

Bottom line, tsinvest, using the -d1 option, didn't get suckered into
the dot-com craze, since that is a long term investment command line
option, and the numbers just were not there for that style of
investment. However, the -d5 option, (which is a trading option that
exploits short term market inefficiency at the daily level,) did quite
well with the dot-coms because, unlike long term investments,
volatility is desirable, and the market can be left quickly when
day trading.

So, it kind of depends on what one want's to do-its an engineered


BTW, the above was kind of "watered down" as a tautology. In reality,
the compensation techniques used in tsinvest/tsshannoneffective are a
little more complicated since:

    P = ((avg / rms) + 1) / 2


    G = (1 + rms)^P * (1 - rms)(1 - P)

so not only does P have to be compensated for an effective value, but
avg and rms too since G is what one wants to bet on. That is why the
values of Peff are different for the 3 methods of calculating P in

As a note, I recently added a new last paragraph on to relate the historical perspective
of the compensation techniques used in
tsinvest/tsshannoneffective-they are not new, and were in the
formalization to the Gaussian/normal bell done in the early
1700's. The sample-average in the repeated trial convergence is a
fixed increment fractal, which was the essence of the derivation,
(although de Moivre didn't know it.) Whether one utilizes this tidy
bit of information to do statistical estimation, or the same thing as
run length phenomena, is not material-they are both the same. Using
the default method in tsinvest is statistical estimation; the -c -C
uses the methodology of run-lengths, and ends up with the same
answer. Its a conceptual issue, only.

If you want to "play" with it, use the tscoins program to generate a
time series, (use -p 0.51, which is a "typical" value for stocks on
the US exchanges, as was used in,)
of about a million days. Graph that information, and pick a big
"bubble", that is about 10X from the average, (i.e., G^n.) Cut that
"bubble" into a new time series, and see how the -c, -c -C, and -C
options to tsinvest handles it with the -d1 option. Note that the
value of P over this interval is quite high 0.55-0.6, and the duration
of the "bubble" will be in years-a simulated dot-com scenario.

Its an interesting concept that fractals can go 10X away from where
they should be, for years. The bubbles-of-bubbles concept is a useful

Jeff Haferman writes:
> Very nice work Ron, and thanks a lot.
> Now, I would like to pose a question that I have pondered for
> quite some time.  Let me give an example:
> Consider symbol "LLTC".  If I use data going back 60 days (eg
> using Ron's spreadsheet, or tsinvest), I get values of approximately
> P = 0.459 and G = 0.993 for the Shannon probability and gain,
> respectively.
> If I go back 350 days for the same symbol, I get P = 0.526 and
> G = 1.001. I know tsinvest can account for uncertainty due
> to data set size, but as a practical matter, which set of
> (P,G) should I "believe" for wagering purposes?
> Ronald McEwan wrote:
> >
> >Here is a spreadsheet with the formulas form John's emails. It includes a
> >utility for downloading daily, weekly and monthly data from Yahoo. You
> >will have to manually re-scale the y axis on the chart depending on the
> >price range of what you are looking at. This spreadsheet only looks at 60
> >days worth of data. It should be easy enough to modify it for your own
> >needs.
> >


John Conover,,

Copyright © 2000 John Conover, All Rights Reserved.
Last modified: Fri Dec 29 22:24:32 PST 2000 $Id: 001217234356.7299.html,v 1.0 2001/11/17 23:05:50 conover Exp $
Valid HTML 4.0!