forwarded message from Michael McMaster

From: John Conover <john@email.johncon.com>
Subject: forwarded message from Michael McMaster
Date: Thu, 13 Jun 1996 00:09:11 -0700

Attached came from the Learning Organization conference, this PM. The
Santa Fe Institute is a think tank that specializes in complexity
theory, and its applications to many different disciplines, (as a
matter of fact, they invented programmed trading of financial
instruments, based on the application of fractal analysis-the simplest
and oldest of the complexity theories-to financial markets.) One of
the prevailing issues in management is that management methodologies
are not based on a firm scientific foundation, (mostly correlation
statistics, which John Casti of SFI called "contemporary numerology.")
There is a very close affiliation between the Learning Organization,
out of MIT, and SFI.

        John

BTW, the reason Casti (RAND Corporation, Research member of the
International Institute of Applied Systems Analysis in Vienna Austria,
Professor at the Technical University of Vienna, now with SFI,) made
that remark is that correlation statistics, (the kind of statistics
popularized in the "soft sciences," and the most common variety as
taught in college,) as opposed to the statistical mechanics, (which is
the corner stone analytical methodology used in the quantum
mechanics-currently accurate to more than 11 decimal places, with no
exceptions in any and all studies ever done,) are not what is called
"single simplex." What this means is that suppose you do an experiment
to measure correlations between two things to verify that one
influences the other in some way. What single simplex means is that,
experimentally, all other things that can influence the outcome of the
experiment have been eliminated.

As a simple example, I notice that wearing expensive clothes is good
for your health, so everyone should wear expensive clothes. (There is
a measurable correlation, BTW, and it is a very strong
correlation-with a confidence level of over 99.997%.) But what is
wrong with the study is that those that can afford expensive clothes,
also, can afford expensive healthcare, so, you would expect them to be
more healthy.

(BTW, while we are talking health, as an interesting side bar, the
most probable time for heart attacks to occur is at 10 in the morning,
and the most probable place for them to occur is in the bathroom, and
the most probable activity when they occur is while sitting on the
throne-so if you want to minimize your chance of heart attack, don't
go to the bathroom, and avoid 10 in the morning.)

There are other issues with correlation statistics. As another, rather
famous, example, in 1876 Sir Francis Galton, tested some data on
plants finished him by Charles Darwin. There were 15 treated plants
and 15 untreated specimens, (the control group.) In rank-ordering the
data, Galton saw that the treated plants were ahead of the untreated
plants with the same rank in 13 out of 15 cases. Galton concluded,
understandably, that the treatment was effective. But assuming perfect
randomness in the data (30 measurements from the same pool of plants,)
the probability of Galton's observations is 3/16, (or a little under
25%.)  In other words, in 3 out of 16 cases a perfectly ineffectual
treatment appears very effective.

This is easily derived, BTW. Consider tossing a coin 2n times. Let 2k
measure how often the accumulated number of heads is greater than the
accumulated number of tails. The number of possibilities, N(2k) for
this outcome is given by the "Catalan number:"

            [2n]    1
    N(2k) = [  ] -------
            [n ]  n + 1

which is independent of k! (This is where Sir Galton's intuition
failed him.)

There are other interesting, (ie., counter-intuitive) things that can
demonstrated with Catalan numbers, also. For example, what is the
probability that at least two people in a room have the same birthday?
(Answer, if there are 23 people in the room, the chance is 50-50.) Or,
what is the probability that in a coin tossing game of 20 tosses, each
player will lead 10 times? (Answer, about 6%-not too astonishing, but
read the next question.) What is the probability that one player will
lead for ALL 20 tosses? (Answer, slightly greater than 35%!!!!) Or, in
other words, in a perfectly random and fair coin tossing event where a
chance of a toss coming up heads is 50% and the chance of tails coming
up is 50%, the chance of one player leading for 20 tosses is 6 times
greater than the chance of each player winning 50% of the tosses, even
though it is a fair coin, being tossed many times! (Which should scare
you from using correlation studies, since it would detect correlations
that do not exist, ie., leading for 20 tosses is pretty strong
evidence that a correlation exists-which we know is incorrect since it
is a fair coin.) The reason that the coin tossing game works that way
is that the coin tossing game is a fractal process. (Fractals are
always a process that are a sum of random variables, like the sum of
money you get while tossing a fair coin, or any other gambling game
for that matter-including wagering in the stock market-and this means
that correlation studies often lead to erroneous conclusions since
they depend on the process containing random variables, but they can
not be a summed together. The reason is that although random variables
can have a normal distribution, ie., Gaussian bell curve distribution,
summing random variables can not provide such a Gaussian bell curve
distribution, upon which, as a fundamental paradigm, correlation
statistics depends.)

In case you are curious, the fluctuations of a gambler's capital is a
fractal, and is generally referred to as "Brownian Motion," named
after the Scottish botanist Robert Brown, who observed small particles
being buffeted on a slide under a microscope, and correctly described
it as a physical phenomena. None other than A. Einstein shed light on
the problem, (sorry for the pun,) and concluded that Jean Baptiste
should use it to win a Nobel and develop a molecular theory-one of the
first applications of complexity theory-that could be used to derive
the number of molecules in a volume-it was the beginning of the
quantum mechanics.  So, the statistical mechanics that was derived for
the quantum mechanics can be used for many things-in general where you
have many microscopic things that contribute to a macroscopic
phenomena-like molecules that make up a volume, a disease that becomes
an epidemic, or many coin tosses in a gambler's game that become the
gambler's capital. (Most of what is called programmed trading is an
application of fractals, ie., there are many folks, simultaneously,
trading a stock, making the value go up and down, which is the
macroscopic phenomena-and is generally modeled as Brownian Motion-add
a little information theory, which states that the optimal fraction of
your portfolio that should be invested in a stock is proportional to
the average value of the day to day increments of the stock's value,
squared, and you have your wagering strategy for your investment
portfolio.) Investors using statistical correlations have not faired
well on Wall Street, and those that use correlation statistics as an
adjunct to the efficient market hypothesis have done worse. (Portfolio
growth using statistical correlations typically run at about 0.8 of
the growth of the exchange indexes-a good programmed trader, using
fractal methodologies will do about twice as well as the indexes-about
1.9 X in portfolio growth this year.)

There is a very simple game that you can play to demonstrate the
point. Take a single 6 sided die, and a capital reserve, say, a bunch
of match sticks. Make a wager of several match sticks, and roll the
die. If the die comes up 1, 2, 3, or 4, you win, and get to add to
your capital reserves the number match sticks wagered. If the die
comes up 5 or 6, you loose the match sticks you wagered, and have to
remove them from your capital reserves. If you make a graph of your
capital reserves over time, you will find that it looks exactly like a
stock's historical value!  (And a meticulous application of
statistical mechanics will be even more convencing.) So, how would a
programmed trader play the game? I will give you a hint: your wager
should be exactly one third of your capital reserves, (for example,
if, on a particular roll of the die, you had a capital reserve of 100
matches, you would wager 33 of them,) with every roll of the die. Play
the game with different strategies, (say changing the wager to one
forth the capital reserves, and see how slow your capital reserves
grow-then try one half, and see how fast your capital reserves go
away.)

Then, if you are a glutton for punishment, try to establish a strategy
for playing the game with statistical correlations, (obviously, there
are no correlations, but if you do a correlation analysis, I will
guarantee you that you will find many long run correlations that will
correlate to anything you want-remember that a long run of 20 wins in
20 tosses of a coin is 6 times, or so, more prevalent than the two
players each coming up with the same number of wins in the 20 tosses,
which is the statistical average of a fair coin.)

There was an interesting experiment proposed by a Professor of Applied
Mathematics and Statistics, (one J. Casti, to be exact,) that
concerned finding a correlation that would predict the next Super Bowl
winner.  The problem was assigned to an undergraduate class, as a
class project, and the only stipulation was that the correlation study
had to be able to withstand academic scrutiny. The correlation study
with highest confidence level was that if the first letter of the
team's name, and the first letter of the team's home town both were in
the upper or lower 13 letters of the 26 letter alphabet, the team
would win-and indeed a long string of 15 of the out of 18 Super Bowls
were won by a team with those formidable credentials.

In case you are curious as to why correlation studies give such
dubious results, it is not because of interpretational biases of the
experimenters, (although, with something as easily "messagable" as
correlation studies, I would suppose that some would argue the
point-the technique used in these, presumably rare, instances is what
is known in the trade as "torturing the data until it confesses,"
another Casti'ism.) The problem is a fundamental, paradigm issue with
the concept of correlation statistics.  You see, for statistical
correlations, the variances in the observations are presumed to have
the distribution of a normal, or Gaussian bell curve. And, of course,
many random variables do, indeed, have such a distribution. The issue
is that when you sum random variables, they do not have normal
distribution, (ie., sum the wins and losses of a coin tossing game,
sum match sticks using a die as the random variable, etc.) The
distribution of a summation process on a random variable is also a
bell curve, but it is not normal, or Gaussian, although the two are
incredibly similar (actually, it is within a small fraction of a
percent-but that makes a big difference.)  And how does that effect
correlation studies? The major issue is that Gaussian bell curves add
root mean square-distributions that are Brownian add linearly. And
what is the error committed when one confuses the two? Well,
theoretically, if you are adding two like distributions together, 1
minus one over the square root of 2, or about 29%, which is a long way
away from the 11 digit accuracy that is commonly required in the
quantum mechanics.

Now you know why there is a significant movement in mathematical
physics to move complexity theory into the mainstream of systemic
modeling, (ie., describing things, mathematically.) This is the
charter of SFI. And this includes a macroscopic theory of the
organization, which is closely related to the work going on in the
Learning Organization, on which, many members of which will soon be
lecturing. See the attached.

BTW, one of the classical ism's that I heard a while back when
referring to correlation statistics was the name "saccharin science,"
in relation to the 7 studies, all meticulous, done on whether, or not,
saccharin causes cancer. The studies have cost about a tenth of a
billion dollars to date, and are all of very high confidence levels,
(one way or the other,) and all contradictory. And, so what's being
done about that?  You guessed it, study number 8 is underway, and
confidence is high that the other 7 studies can all be reconciled with
correlation statistics in study number 8. (Bets are, using the Catalan
numbers, that this study will find a slight correlation, but not as
strong as the previous studies. The next study, after number 8-the
study to end all studies on saccharin-will be anti-correlative-but
even less strong than study number 8.) The person that coined the
phrase, "saccharin science," is a known sceptic, (and a damned fine
mathematician,) and claims that the function of correlation studies is
a public works project for social scientists. (BTW, it was not Casti
who said this.)

--

John Conover, john@email.johncon.com, http://www.johncon.com/
Last modified: Fri Mar 26 18:56:34 PST 1999 $Id: 960613000955.27110.html,v 1.0 2001/11/17 23:05:50 conover Exp $