From: John Conover <john@email.johncon.com>

Subject: forwarded message from Michael McMaster

Date: Thu, 13 Jun 1996 00:09:11 -0700

Attached came from the Learning Organization conference, this PM. The Santa Fe Institute is a think tank that specializes in complexity theory, and its applications to many different disciplines, (as a matter of fact, they invented programmed trading of financial instruments, based on the application of fractal analysis-the simplest and oldest of the complexity theories-to financial markets.) One of the prevailing issues in management is that management methodologies are not based on a firm scientific foundation, (mostly correlation statistics, which John Casti of SFI called "contemporary numerology.") There is a very close affiliation between the Learning Organization, out of MIT, and SFI. John BTW, the reason Casti (RAND Corporation, Research member of the International Institute of Applied Systems Analysis in Vienna Austria, Professor at the Technical University of Vienna, now with SFI,) made that remark is that correlation statistics, (the kind of statistics popularized in the "soft sciences," and the most common variety as taught in college,) as opposed to the statistical mechanics, (which is the corner stone analytical methodology used in the quantum mechanics-currently accurate to more than 11 decimal places, with no exceptions in any and all studies ever done,) are not what is called "single simplex." What this means is that suppose you do an experiment to measure correlations between two things to verify that one influences the other in some way. What single simplex means is that, experimentally, all other things that can influence the outcome of the experiment have been eliminated. As a simple example, I notice that wearing expensive clothes is good for your health, so everyone should wear expensive clothes. (There is a measurable correlation, BTW, and it is a very strong correlation-with a confidence level of over 99.997%.) But what is wrong with the study is that those that can afford expensive clothes, also, can afford expensive healthcare, so, you would expect them to be more healthy. (BTW, while we are talking health, as an interesting side bar, the most probable time for heart attacks to occur is at 10 in the morning, and the most probable place for them to occur is in the bathroom, and the most probable activity when they occur is while sitting on the throne-so if you want to minimize your chance of heart attack, don't go to the bathroom, and avoid 10 in the morning.) There are other issues with correlation statistics. As another, rather famous, example, in 1876 Sir Francis Galton, tested some data on plants finished him by Charles Darwin. There were 15 treated plants and 15 untreated specimens, (the control group.) In rank-ordering the data, Galton saw that the treated plants were ahead of the untreated plants with the same rank in 13 out of 15 cases. Galton concluded, understandably, that the treatment was effective. But assuming perfect randomness in the data (30 measurements from the same pool of plants,) the probability of Galton's observations is 3/16, (or a little under 25%.) In other words, in 3 out of 16 cases a perfectly ineffectual treatment appears very effective. This is easily derived, BTW. Consider tossing a coin 2n times. Let 2k measure how often the accumulated number of heads is greater than the accumulated number of tails. The number of possibilities, N(2k) for this outcome is given by the "Catalan number:" [2n] 1 N(2k) = [ ] ------- [n ] n + 1 which is independent of k! (This is where Sir Galton's intuition failed him.) There are other interesting, (ie., counter-intuitive) things that can demonstrated with Catalan numbers, also. For example, what is the probability that at least two people in a room have the same birthday? (Answer, if there are 23 people in the room, the chance is 50-50.) Or, what is the probability that in a coin tossing game of 20 tosses, each player will lead 10 times? (Answer, about 6%-not too astonishing, but read the next question.) What is the probability that one player will lead for ALL 20 tosses? (Answer, slightly greater than 35%!!!!) Or, in other words, in a perfectly random and fair coin tossing event where a chance of a toss coming up heads is 50% and the chance of tails coming up is 50%, the chance of one player leading for 20 tosses is 6 times greater than the chance of each player winning 50% of the tosses, even though it is a fair coin, being tossed many times! (Which should scare you from using correlation studies, since it would detect correlations that do not exist, ie., leading for 20 tosses is pretty strong evidence that a correlation exists-which we know is incorrect since it is a fair coin.) The reason that the coin tossing game works that way is that the coin tossing game is a fractal process. (Fractals are always a process that are a sum of random variables, like the sum of money you get while tossing a fair coin, or any other gambling game for that matter-including wagering in the stock market-and this means that correlation studies often lead to erroneous conclusions since they depend on the process containing random variables, but they can not be a summed together. The reason is that although random variables can have a normal distribution, ie., Gaussian bell curve distribution, summing random variables can not provide such a Gaussian bell curve distribution, upon which, as a fundamental paradigm, correlation statistics depends.) In case you are curious, the fluctuations of a gambler's capital is a fractal, and is generally referred to as "Brownian Motion," named after the Scottish botanist Robert Brown, who observed small particles being buffeted on a slide under a microscope, and correctly described it as a physical phenomena. None other than A. Einstein shed light on the problem, (sorry for the pun,) and concluded that Jean Baptiste should use it to win a Nobel and develop a molecular theory-one of the first applications of complexity theory-that could be used to derive the number of molecules in a volume-it was the beginning of the quantum mechanics. So, the statistical mechanics that was derived for the quantum mechanics can be used for many things-in general where you have many microscopic things that contribute to a macroscopic phenomena-like molecules that make up a volume, a disease that becomes an epidemic, or many coin tosses in a gambler's game that become the gambler's capital. (Most of what is called programmed trading is an application of fractals, ie., there are many folks, simultaneously, trading a stock, making the value go up and down, which is the macroscopic phenomena-and is generally modeled as Brownian Motion-add a little information theory, which states that the optimal fraction of your portfolio that should be invested in a stock is proportional to the average value of the day to day increments of the stock's value, squared, and you have your wagering strategy for your investment portfolio.) Investors using statistical correlations have not faired well on Wall Street, and those that use correlation statistics as an adjunct to the efficient market hypothesis have done worse. (Portfolio growth using statistical correlations typically run at about 0.8 of the growth of the exchange indexes-a good programmed trader, using fractal methodologies will do about twice as well as the indexes-about 1.9 X in portfolio growth this year.) There is a very simple game that you can play to demonstrate the point. Take a single 6 sided die, and a capital reserve, say, a bunch of match sticks. Make a wager of several match sticks, and roll the die. If the die comes up 1, 2, 3, or 4, you win, and get to add to your capital reserves the number match sticks wagered. If the die comes up 5 or 6, you loose the match sticks you wagered, and have to remove them from your capital reserves. If you make a graph of your capital reserves over time, you will find that it looks exactly like a stock's historical value! (And a meticulous application of statistical mechanics will be even more convencing.) So, how would a programmed trader play the game? I will give you a hint: your wager should be exactly one third of your capital reserves, (for example, if, on a particular roll of the die, you had a capital reserve of 100 matches, you would wager 33 of them,) with every roll of the die. Play the game with different strategies, (say changing the wager to one forth the capital reserves, and see how slow your capital reserves grow-then try one half, and see how fast your capital reserves go away.) Then, if you are a glutton for punishment, try to establish a strategy for playing the game with statistical correlations, (obviously, there are no correlations, but if you do a correlation analysis, I will guarantee you that you will find many long run correlations that will correlate to anything you want-remember that a long run of 20 wins in 20 tosses of a coin is 6 times, or so, more prevalent than the two players each coming up with the same number of wins in the 20 tosses, which is the statistical average of a fair coin.) There was an interesting experiment proposed by a Professor of Applied Mathematics and Statistics, (one J. Casti, to be exact,) that concerned finding a correlation that would predict the next Super Bowl winner. The problem was assigned to an undergraduate class, as a class project, and the only stipulation was that the correlation study had to be able to withstand academic scrutiny. The correlation study with highest confidence level was that if the first letter of the team's name, and the first letter of the team's home town both were in the upper or lower 13 letters of the 26 letter alphabet, the team would win-and indeed a long string of 15 of the out of 18 Super Bowls were won by a team with those formidable credentials. In case you are curious as to why correlation studies give such dubious results, it is not because of interpretational biases of the experimenters, (although, with something as easily "messagable" as correlation studies, I would suppose that some would argue the point-the technique used in these, presumably rare, instances is what is known in the trade as "torturing the data until it confesses," another Casti'ism.) The problem is a fundamental, paradigm issue with the concept of correlation statistics. You see, for statistical correlations, the variances in the observations are presumed to have the distribution of a normal, or Gaussian bell curve. And, of course, many random variables do, indeed, have such a distribution. The issue is that when you sum random variables, they do not have normal distribution, (ie., sum the wins and losses of a coin tossing game, sum match sticks using a die as the random variable, etc.) The distribution of a summation process on a random variable is also a bell curve, but it is not normal, or Gaussian, although the two are incredibly similar (actually, it is within a small fraction of a percent-but that makes a big difference.) And how does that effect correlation studies? The major issue is that Gaussian bell curves add root mean square-distributions that are Brownian add linearly. And what is the error committed when one confuses the two? Well, theoretically, if you are adding two like distributions together, 1 minus one over the square root of 2, or about 29%, which is a long way away from the 11 digit accuracy that is commonly required in the quantum mechanics. Now you know why there is a significant movement in mathematical physics to move complexity theory into the mainstream of systemic modeling, (ie., describing things, mathematically.) This is the charter of SFI. And this includes a macroscopic theory of the organization, which is closely related to the work going on in the Learning Organization, on which, many members of which will soon be lecturing. See the attached. BTW, one of the classical ism's that I heard a while back when referring to correlation statistics was the name "saccharin science," in relation to the 7 studies, all meticulous, done on whether, or not, saccharin causes cancer. The studies have cost about a tenth of a billion dollars to date, and are all of very high confidence levels, (one way or the other,) and all contradictory. And, so what's being done about that? You guessed it, study number 8 is underway, and confidence is high that the other 7 studies can all be reconciled with correlation statistics in study number 8. (Bets are, using the Catalan numbers, that this study will find a slight correlation, but not as strong as the previous studies. The next study, after number 8-the study to end all studies on saccharin-will be anti-correlative-but even less strong than study number 8.) The person that coined the phrase, "saccharin science," is a known sceptic, (and a damned fine mathematician,) and claims that the function of correlation studies is a public works project for social scientists. (BTW, it was not Casti who said this.) -- John Conover, john@email.johncon.com, http://www.johncon.com/

Copyright © 1996 John Conover, john@email.johncon.com. All Rights Reserved. Last modified: Fri Mar 26 18:56:34 PST 1999 $Id: 960613000955.27110.html,v 1.0 2001/11/17 23:05:50 conover Exp $