John Conover: tsinvestsim-lognormal

john@email.johncon.com

http://www.johncon.com/john/

tsinvestsim-lognormal

Verification of Analytical Methodology for Evaluating GDP Per Capita:

To demonstrate the validity of the analytical methodology used in the GDP per capita analysis, tsinvestsim(1) simulations were used to evaluate the convergence of the log-normal distribution of GDP per capita with population sizes over five orders of magnitude, (N = 10 to N = 100,000,) and each worker in the workforce having P = 0.51, f = rms = 0.02 * 0.5 * 0.41, representing a maximally optimal efficiency, (P = 0.51, f = rms = 0.02,) with an inefficiency of 0.41 representing the fraction of the population in the workforce, and each worker in the workforce operating with an inefficiency of 0.5. These variable values are representative for the daily values, per worker, of many industrialized nations.

The output of the tsinvestsim(1) program was summed into an aggregate GDP using the tsinvestsim-lognormal(1) program, (source archive,) to obtain the mean, median, and mode of the simulated GDP, as a time series, with a log-normal distribution, i.e., the GDP per capita distribution.

Since economic data for productivity is available only on a per capita basis, it is not feasible to determine the inefficiencies, (they are lumped together.) The methodology used allows simulations with different variable values for P, f = rms, and avg. For the current simulations, it is assumed that each worker in the workforce can operate maximally optimal, P = 0.51, and f = rms = 0.02, (f^2 = rms^2 = avg, which is maximally optimal, subject to the Kelly Criteria,) and f = rms is decreased to represent the inefficiencies. Note that this decreases avg by the same inefficiency factor:



    P = ((avg / (k * rms)) + 1) / 2
    avg = (k * rms) * ((2 * P) - 1)

Where P is invariant with k, (avg is computed by the tsinvestsim(1) program in this manner) Note, further, that:



    rms = sqrt (srms^2 + avg^2)
    k * rms = k * sqrt (srms^2 + avg^2)
    k * rms = sqrt ((k^2 * srms^2) + (k^2 * avg^2))
    k * rms = sqrt ((k * srms)^2 + (k * avg)^2)

indicating that rms can be multiplied by a constant, k, without first calculating srms and avg, then multiplying each by k, squaring each product, and finally taking the root-mean-square of the sum of the squares.

The rationale for selecting variable values in this manner is that P is a metric on a worker's ability to synthesize innovation based on new information, (original, secret, or public,) and f = rms is a metric on the wager made on the innovation, with avg being the return, (positive or negative,) generated by the innovation, i.e., metrics on how smart the worker is, and how good a gambler.

If the national GDP per capita, (an aggregate expenditure approach,) and distribution of income, (an aggregate income approach,) are analyzed together, the variable values for a typical worker in the economy can be determined.

In the following simulations, all workers have the same daily variable values P = 0.51, and f = rms = 0.02 * 0.5 * 0.41, (for 51 times out of a hundred innovation success rate, 50% efficiency at wagering, and 41% of the population in the workforce,) with each worker starting at I = $10, (all equal, representing the rural/shared agricultural economy of the US in 1610,) and the simulations are all allowed to run for 100,000 days, (about 4 centuries at 250 work days a year,) developing into a log-normal distribution standard of living, (i.e., GDP growth per capita,) by 1790, and continuing through 2010. The simulation outputs are then sampled every 250 days, for annual data, and all values filtered, prior to 1790, resulting in 1790 through 2010 annual data. The simulation was repeated for workforce sizes of N = 10, N = 100, N = 1,000, N = 10,000, and N = 100,000, to verify convergence to the mean of the simulations.

It should be pointed out, as a concluding remark, that the objective of the simulations are validity of methodology which requires only a reasonable approximation to the actual US GDP per capita-and two digit precision variables is adequate.

Analysis of Simulations:

Figure I

Figure I is a plot of the US nominal GDP per capita income distribution, 2010: theoretical distribution, empirical distribution, and, simulated distribution (N = 100,000.)

Figure 2

Figure 2 is a plot of the US nominal GDP per capita, 1790-2010: empirical values, simulated values (N = 100,000,) and Least Squares fit of the empirical values.

us.simulated.nominal.gdp.capita.median.jpg

Figure III

Figure III is a plot of the simulated US GDP per capita log-normal distribution median, over time, with five orders of magnitude of population sizes, N, and a plot of the theoretical values for N = 100,000. Note the convergence to the theoretical values with increasing N. Further, note that the values converged to are independent of N:



    P = 0.51
    rms = 0.02 * 0.5 * 0.41
    avg = (rms) * ((2 * P) - 1)
        = 0.000082
    G(avg,rms) = 1.00007359809704021647

Or, the value, V, at a time x many days, starting with an initial value of I = 10:



    V(x) = I * G(avg,rms)^x

Or:



    ln (V(x)) = ln (10) + (x * ln (G(avg,rms)))
              = 2.30258509299404568402 + (0.00007359538883315094 * x)

It is important to note that this represents the aggregate growth in the median of the workforce divided by the size of the population, (i.e., the median per capita value,) over time, with the workforce operating at 50% efficiency, (relative to rms,) and has a direct mathematical relationship to the typical worker variable values, rms and P, on a daily basis. This function is exponentiated to obtain the median of the log-normal distribution, over time.

Figure IV

Figure IV is a plot of the simulated US nominal GDP per capita log-normal distribution deviation, over time, with five orders of magnitude of population sizes, N, and a plot of the theoretical values for N = 100,000. Note the convergence to the theoretical values with increasing N. Further, note that the values converged to are independent of N:



    P = 0.51
    rms = 0.02 * 0.5 * 0.41
    avg = (rms) * ((2 * P) - 1)
        = 0.000082
    srms = sqrt (rms^2 - avg^2)
         = 0.0040991799179835959

Or, the value, S, at a time x many days:



    S(x) = srms * sqrt (x)
         = 0.0040991799179835959 * sqrt (x)

It is important to note that this represents the standard deviation of the Gaussian/Normal distribution of the standard of living of the workforce (i.e., the distribution of the per capita values,) over time, with the workforce operating at 50% efficiency, (relative to rms,) and has a direct mathematical relationship to the typical worker variable values, rms and P, on a daily basis. The square of this function, divided by two, is exponentiated to obtain the deviation from the median of the log-normal distribution, over time.

Further, it is important to note that the srms of the workers, (or their typical value of the aggregate,) is related to the median and mean of the log-normal distribution at any time.

Analytical Methodology:

Let t be the time interval the log-normal distribution has evolved in the data file, data.file, of the GDP per capita. Then, in the t'th, (last,) interval:



    u = Mu
    r = Rho

    median = e^u
    mean = e^(u + (r^2 / 2))
    mode = e^(u - r^2)

If the data.file represents GDP per capita data, (i.e., annual GDP divided by annual population count,) then the file represents mean GDP per capita data. One of the issues is to convert the data to median data.

It is convenient to analyze the evolution of log-normal distribution, meaning by time, t:



    median(t) = e^u(t)
    mean(t) = e^(u(t) + (r(t)^2 / 2))
    mode(t) = e^(u(t) - r(t)^2)

where for economic data, (like the US GDP per capita,) the mean(t) historical time series and the median(t), mean(t), and mode(t) are available only at one point in time, (usually the last interval.)



    mean(t) = e^(u(t) + (r(t)^2 / 2))
    mode(t) = e^(u(t) - r(t)^2)
    mean(t) = e^(u(t)) * e^(r(t)^2 / 2)
    mode(t) = e^(u(t)) / e^(r(t)^2)
    mean(t) / mode(t) = (e^(u(t)) * e^(r(t)^2 / 2)) / (e^(u(t)) / e^(r(t)^2))
    mean(t) / mode(t) = (e^(r(t)^2 / 2)) / (1 / e^(r(t)^2))
    mean(t) / mode(t) = (e^(r(t)^2 / 2)) * e^(r(t)^2)
    mean(t) / mode(t) = e^((r(t)^2 / 2) + r(t)^2)
    mean(t) / mode(t) = e^(r(t)^2 * (1 + (1 / 2)))
    mean(t) / mode(t) = e^((3 / 2) * r(t)^2)
    ln (mean(t) / mode(t)) = (3 / 2) * r(t)^2
    r(t)^2 = (2 / 3) * ln (mean(t) / mode(t))
    r(t) = sqrt ((2 / 3) * ln (mean(t) / mode(t)))



    r(t) = srms * sqrt (t)
    r(t)^2 = srms^2 * t



    mean(t) = e^(u(t) + (r(t)^2 / 2))
    u(t) = a + (b * t)
    mean(t) = e^(a + (b * t) + (srms^2 * t / 2))
    mean(t) = e^(a + (b * t) + ((srms^2 / 2) * t))
    mean(t) = e^(a + ((b + (srms^2 / 2)) * t))
    median(t) = e^u(t) = e^(a + (b * t))

with a and b determined by LSQ of mean(t).

Validation:

Unfortunately, the tsinvestsim-lognormal(1) program does not provide the mode(t) but as an alternative derivation using the median(t) and mean(t) for analysis of the simulation:



    mean(t) = e^(u(t) + (r(t)^2 / 2))
    mean(t) = e^u(t) * e^(r(t)^2 / 2)
    median(t) = e^u(t)

    mean(t) = e^u(t) * e^(r(t)^2 / 2)
    mean(t) / median(t) = e^u(t) * e^(r(t)^2 / 2) / e^u(t)
    mean(t) / median(t) = e^(r(t)^2 / 2)
    r(t)^2 / 2 = ln (mean(t) / median(t))
    r(t)^2 = 2 * ln (mean(t) / median(t))
    r(t) = sqrt (2 * ln (mean(t) / median(t)))

    r(t) = srms * sqrt (t)
    srms = r(t) / sqrt (t)

    mean(t) = e^(a + ((b + (srms^2 / 2)) * t))
    ln (mean(t)) = a + ((b + (srms^2 / 2)) * t)

The actual values used in the simulation:



    P = 0.51, f = 0.02 * 0.50 * 0.41 = 0.0041

    f = rms = 0.02 * 0.50 * 0.41 = 0.0041
    0.51 = ((avg / 0.0041) + 1) / 2
    avg = 0.0041 * ((2 * 0.51) - 1)
        = 0.000082
    srms = sqrt (rms^2 - avg^2)
         = sqrt (0.0041^2 - 0.000082^2)
         = 0.0040991799179835959

    G(avg,rms) = G(0.0041 * ((2 * 0.51) - 1),0.02 * 0.50 * 0.41)
               = 1.00007359809704021647
    G(t) = 10 * (1.00007359809704021647^t)
    ln (G(t)) = ln (10) + (t * ln (1.00007359809704021647))
              = 2.30258509299404568402 + (0.00007359538883315094 * t)

    ln (mean(t)) = a + ((b + (srms^2 / 2)) * t)
    a = 2.30258509299404568402
    b + (srms^2 / 2) = 0.00007359538883315094
    b = 0.00007359538883315094 - (srms^2 / 2)
      = 0.00007359538883315094 - (0.0040991799179835959^2 / 2)
      = 0.00006519375083315094
    ln(mean(t)) = 2.30258509299404568402 + (0.00007359538883315094 * t)

And, analyzing the data from the simulations, (the order of the fields from tsinvestsim-lognormal(1) program): "time, minimum, median, mean, maximum",):



    cut -f4 0.51-10 | tslsq -e -p
    e^(2.243692 + 0.000091t)
    cut -f4 0.51-10 | tsfraction | tsavg -p
    0.000089
    cut -f4 0.51-10 | tsfraction | tsrms -p
    0.002662
    G(0.000089,0.002662) = 1.00008546072723226738
    ln (G(0.000089,0.002662)) = 0.00008545707567235967
    egrep '^99999' 0.51-10
    99999       1906.627099     8477.783394     52569.322616    387424.213469
    r(100000) = sqrt (2 * ln (52569.322616 / 8477.783394))
              = 1.91033175441258050009
    srms = 1.91033175441258050009 / sqrt (100000)
         = 0.00604099943048917012
    r(t) = srms * sqrt (t)
    r(t) = 0.00604099943048917012 * sqrt (t)
    r(t)^2 / 2 = 0.00001824683705958524 * t
    ln (mean(t)) = a + ((b + (srms^2 / 2)) * t)
    a = 2.243692
    b = 0.000091 - (0.00604099943048917012^2 / 2)
      = 0.00007275316294041476
    u(t) = 2.243692 + (0.00007275316294041476 * t)



    cut -f4 0.51-100 | tslsq -e -p
    e^(2.230331 + 0.000083t)
    cut -f4 0.51-100 | tsfraction | tsavg -p
    0.000084
    cut -f4 0.51-100 | tsfraction | tsrms -p
    0.000736
    G(0.000084,0.000736) = 1.00008373267247867903
    ln (G(0.000084,0.000736)) = 0.00008372916709413426
    egrep '^99999' 0.51-100
    99999       290.545725      12589.060312    43569.000789    1248187.418192
    r(100000) = sqrt (2 * ln (43569.000789 / 12589.060312))
              = 1.57576501978003208644
    srms = 1.57576501978003208644 / sqrt (100000)
         = 0.00498300651972517984
    r(t) = srms * sqrt (t)
    r(t) = 0.00498300651972517984 * sqrt (t)
    r(t)^2 / 2 = 0.00001241517698781182 * t
    ln (mean(t)) = a + ((b + (srms^2 / 2)) * t)
    a = 2.230331
    b = 0.000083 - (0.00498300651972517984^2 / 2)
      = 0.00007058482301218818
    u(t) = 2.230331 + (0.00007058482301218818 * t)



    cut -f4 0.51-1000 | tslsq -e -p
    e^(2.289082 + 0.000082t)
    cut -f4 0.51-1000 | tsfraction | tsavg -p
    0.000083
    cut -f4 0.51-1000 | tsfraction | tsrms -p
    0.000220
    G(0.000083,0.000220) = 1.00008297924392550143
    ln (G(0.000083,0.000220)) = 0.00008297580133848107
    egrep '^99999' 0.51-1000
    99999       101.798190      15302.138722    38246.692966    987378.496244
    r(100000) = sqrt (2 * ln (38246.692966 / 15302.138722))
              = 1.35356159353659762681
    srms = 1.35356159353659762681 / sqrt (100000)
         = 0.00428033758890269486
    r(t) = srms * sqrt (t)
    r(t) = 0.00428033758890269486 * sqrt (t)
    r(t)^2 / 2 = 0.00000916064493748667 * t
    ln (mean(t)) = a + ((b + (srms^2 / 2)) * t)
    a = 2.289082
    b = 0.000082 - (0.00428033758890269486^2 / 2)
      = 0.00007283935506251333
    u(t) = 2.289082 + (0.00007283935506251333 * t)



    cut -f4 0.51-10000 | tslsq -e -p
    e^(2.299558 + 0.000082t)
    cut -f4 0.51-10000 | tsfraction | tsavg -p
    0.000082
    cut -f4 0.51-10000 | tsfraction | tsrms -p
    0.000105
    G(0.000082,0.000105) = 1.00008199784944121253
    ln (G(0.000082,0.000105)) = 0.00008199448780131961
    egrep '^99999' 0.51-10000
    99999       173.857625      15894.484298    36704.069587    1771254.349566
    r(100000) = sqrt (2 * ln (36704.069587 / 15894.484298))
              = 1.29376619780877467417
    srms = 1.29376619780877467417 / sqrt (100000)
         = 0.00409124794481167259
    r(t) = srms * sqrt (t)
    r(t) = 0.00409124794481167259 * sqrt (t)
    r(t)^2 / 2 = 0.00000836915487296287 * t
    ln (mean(t)) = a + ((b + (srms^2 / 2)) * t)
    a = 2.299558
    b = 0.000082 - (0.00409124794481167259^2 / 2)
      = 0.00007363084512703713
    u(t) = 2.299558 + (0.00007363084512703713 * t)



    cut -f4 0.51-100000 | tslsq -e -p
    e^(2.300291 + 0.000082t)
    cut -f4 0.51-100000 | tsfraction | tsavg -p
    0.000082
    cut -f4 0.51-100000 | tsfraction | tsrms -p
    0.000085
    G(0.000082,0.000085) = 1.00008199974949315241
    ln(G(0.000082,0.000085)) = 0.00008199638769747028
    egrep '^99999' 0.51-100000
    99999       39.190364       15804.790282    36694.603142    11503692.146641
    r(100000) = sqrt (2 * ln (36694.603142 / 15804.790282))
              = 1.29793421590474711612
    srms = 1.29793421590474711612 / sqrt (100000)
         = 0.00410442837532374379
    r(t) = srms * sqrt (t)
    r(t) = 0.00410442837532374379 * sqrt (t)
    r(t)^2 / 2 = 0.00000842316614408135 * t
    ln (mean(t)) = a + ((b + (srms^2 / 2)) * t)
    a = 2.300291
    b = 0.000082 - (0.00410442837532374379^2 / 2)
      = 0.00007357683385591865
    u(t) = 2.300291 + (0.00007357683385591865 * t)

Note that:



    tsfraction data.file | tsrms -p
    rms

is meaningless, (as far as productivity per capita is concerned,) since the natural evolution of the log-normal distribution of the aggregate GDP per capita, will be, given enough time and population, a near perfect exponential where avg = rms, and P = 1. The rms of the workers, (or per capita value of the aggregate,) must be determined from the mean(t), mode(t), and the median(t) of the distribution at some time interval, t, (usually the last interval.)

Empirical GDP per capita data can present situations where the long term avg and rms are not nearly equal, which is usually due to governance issues that effect all, (or most,) of the workforce, and rms is greater than avg. Subtracting, (via root-mean-square,) the typical individual rms from the GDP per capita rms offers a methodology for analysis of the governance issues, providing the log-normal distribution has evolved for a sufficient time, and the population size is sufficiently large.

Simulation:

As a concluding note, to align the simulations, (mean-variance simulations of geometric Brownian motion fractals are notoriously unstable,) with the empirical data, 4 data points had to align with the empirical data: the mean of the simulated nominal US GDP per capita with the empirical nominal US GDP per capita at 1790; the simulated nominal US GDP per capita with the empirical nominal US GDP per capita at 2010; and the median; and mean of the log-normal distribution of the productivity/income of the simulated nominal US GDP per capita with the empirical nominal US GDP per capita at 2010. There are 3 variables, which interact: I, the starting value for each worker, (I = $10, in 1610, which was used as a scaling factor); avg; and, rms. Lowering both avg and rms will decrease the ratio of the mean to the median in 2010, and, increasing avg, relative to rms, will increase the growth in the nominal US GDP per capita in the simulation. The number of workers, N, contributing to the simulated nominal US GDP per capita was increased, in steps of orders of magnitude, until the simulations converged to their mean, indicating sufficient accuracy for comparison with the empirical data. Note that the log-normal distribution income empirical data for 2010 is in 2010 nominal dollars, necessitating an analysis of nominal US GDP per capita, (as opposed to the traditional real GDP per capita.) The empirical log-normal distribution income in 1790 is not available or known, necessitating starting the simulation about two centuries before any data of interest to the analysis to allow a log-normal distribution income to evolve by 1790. The tsinvestsim(1) program from the NtropiX site, (in the tsinvest archive,) was used to generate the individual worker and nominal US GDP per capita time series for the simulation. The simulation for N = 100,000 took about 20 hours on a 2.5GHz machine.

calc(1) Macros:

The following calc(1) macros were used for calculation of P(avg,rms) and G(avg,rms) in ~/.calcrc:



    define P (avg, rms) = ((avg / rms) + 1) / 2;
    define G (avg, rms) = power (1 + rms, ((avg / rms) + 1) / 2) * \
                          power (1 - rms, 1 - ((avg / rms) + 1) / 2);

The following calc(1) script is for computing avg, rms, and, G(avg,rms), given srms and G obtained empirically from a time series.



    #!/usr/local/bin/calc -d -f
    #
    # A license is hereby granted to reproduce this design for personal,
    # non-commercial use.
    #
    # THIS DESIGN IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES
    # WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF
    # MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE.  THE
    # AUTHOR DOES NOT WARRANT THAT USE OF THIS DESIGN DOES NOT INFRINGE
    # THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY.
    #
    # So there.
    #
    # Copyright (c) 1992-2015, John Conover, All Rights Reserved.
    #
    # Comments and/or problem reports should be addressed to:
    #
    #     john@email.johncon.com
    #
    #     http://www.johncon.com/john/
    #     http://www.johncon.com/ntropix/
    #     http://www.johncon.com/ndustrix/
    #     http://www.johncon.com/nformatix/
    #     http://www.johncon.com/ndex/
    #
    # A calc(1) script for binary search-for-solution of: given, srms and
    # g; find avg, rms, and, G(avg,rms).
    #
    # Both the domain and range, between "top" and "bottom" must be
    # monotonic increasing on avg; G(avg,rms) is monotonic increasing on
    # increasing avg, (starting with avg = rms = 1 to avoid division by
    # zero in the calculation of the first iteration of G(avg,rms), and
    # start the binary search at avg = 0.5.)
    #
    # The variables srms and g, are required. The variable g is G(avg,rms)
    # to search for, (and must be greater than unity,) given srms, (which
    # must be greater than zero):
    #
    # Real US GDP:
    #
    srms = 0.02126296324279258349;
    g = 1.0170171123188325426;
    #
    # Nominal US GDP:
    #
    # srms = 0.02126296324302608945;
    # g = 1.02982917935065044642;
    #
    top = 1;
    bottom = 0;
    avg = 1;
    rms = 1;
    temp = 0.0;
    #
    while (abs ((temp = G(avg,rms)) - g) > 0.0000000000000000001)
    {

        if (temp < g)
        {
            bottom = bottom + ((top - bottom) / 2.0);
            avg = bottom + ((top - bottom) / 2.0);
            rms = sqrt (avg^2 + srms^2);
            /* printf ("1: avg = %f, rms = %f, G(avg,rms) = %f\n", avg, rms, temp); */
        }

        else
        {
            top = top - ((top - bottom) / 2.0);
            avg = top - ((top - bottom) / 2.0);
            rms = sqrt (avg^2 + srms^2);
            /* printf ("2: avg = %f, rms = %f, G(avg,rms) = %f\n", avg, rms, temp); */
        }

    }
    #
    printf ("avg = %f, rms = %f, G(avg,rms) = %f\n", avg, rms, temp);

License

The information contained herein is private and confidential and dissemination is strictly forbidden, except under the provisions of contractual license.

THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS INFORMATION DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY.

So there.

Comments, questions, and problem reports should be addressed to:

john@email.johncon.com

http://www.johncon.com/john/

http://www.johncon.com/ntropix/

http://www.johncon.com/ndustrix/

http://www.johncon.com/nformatix/

http://www.johncon.com/ndex/

Last modified: Thu Aug 20 12:24:28 PDT 2015 $Id: tsinvestsim-lognormal.html,v 1.0 2015/08/20 19:24:52 conover Exp $