From: John Conover <john@email.johncon.com>

Subject: Analyzing the Spread of SARS with the Logistic Function

Date: 22 Apr 2003 08:39:21 -0000

Note: there is an addendem concerning this page on Analyzing the Spread of the SoBig.E Virus. From the sidebar:

"... although the analysis of the spread of SARS seems impressive, it must be remembered that the logistic, (what Europeans call the discreet time parabolic,) function has other solutions, too-exhibiting long term cyclic phenomena which tends to

phase-lockon the particular idiosyncrasies of the specific virus.There is a very significant probability that the SARS outbreak in the Autumn of 2003 will be a repeat of late 2002/early 2003 outbreak; and not only that, it will probably be worse.

Such is the nature of the logistic/parabolic function."

This analysis originally appeared in the NdustriX and NtropiX mailing lists.

- -John
- June 25, 2003

Note: there are some interesting recent URLs concerning SARS. The Genome Sciences Cenre decoded the SARS DNA sequence on April 12, 2003. See their SARS-associated Coronavirus page for particulars. The Linux Journal has a web page on the technical aspects of Sequencing the SARS virus. The sequence itself is available at http://mkweb.bcgsc.ca/sars/AY274119.fa, as a text file in FASTA format, consisting of a header line and the sequence, split into fixed length lines.

The sequence was identified as a fourth group of coronaviruses on April 12, 2003, (its closest cousin is bovine coronavirus.)

- -John
- November 5, 2003

On April 22, 2003, the SARS data was downloaded from the World Health Organization's Cumulative Number of
Reported Probable Cases of Severe Acute Respiratory Syndrome
(SARS) and edited in a text editor, (specifically, adding a data
point for Sundays, where there was no data-half way between the
Saturday and Monday values were used,) to make a time series,

,
of the cumulative number of SARS cases from March 17, 2003, to April
22, 2003. (Note that differential diagnosis for SARS is not consistent
over the world, for example, see CDC Media Relations -
SARS US Case Report for the number of US cases from the Centers for Disease Control and
Prevention which is inconsistent with the WHO data.)*sars*

The logistic function, or S-Curve, was suggested by William S. Jevons, (as cited by Alfred Kleinknecht,) in 1884-but there were earlier works, too. R. Ayres extended the concept, and so did the Russian economist N. D. Kondratieff in 1926. Schumpeter made contributions, also. The logistic function is often used in the analysis of disease epidemics[1], but is difficult to use because of numerical stability issues which are characteristic of all non-linear dynamical systems, (NLDS.) Very small errors in data can create very large errors in the analysis-in point of fact, the analytical errors diverge exponentially, (see the Ljapunov Exponent for particulars.)

The source code to the software used in the analysis is available from the Utilities page on the NdustriX site, and is distributed under License.

**
***tslsq* -l -p *sars*
8951.016153 / (1 + e^(-(-3.223258 + 0.096215t)))
*tslsq* -e -p *sars*
e^(5.887387 + 0.080080t)

Figure I is a plot of the cumulative number of reported probable cases of Severe Acute Respiratory Syndrome, from March 17, 2003, to, April 22, 2003, and the logistic function least-squares-best-fit to the data. For comparison, the exponential least-squares-best-fit to the data is, also, included. Of interest is the close proximity of the logistic and exponential functions at the beginning of the analysis, (and, since all three are in close proximity, it means we really don't have enough data; its not an issue of having enough data-its an issue of having data over enough time.)

The least-squares-best-fit curve fitting methodology is the most accurate possible, and shows the projection of infection rate slowing in about two months, with about 9,000 infections, world wide.

But how accurate is the best possible accuracy?

It turns out that its not an easy question to answer, but we can
get some *feel* for it.

**
***tsderivative* *sars* > *sars.derivative*
*tslsq* -p *sars.derivative*
108.558559 + -0.203346t
*tslsq* -l *sars* | *tsderivative* > *LSQ-Logistic.derivative*
*tslsq* -e *sars* | *tsderivative* > *LSQ-Exponential.derivative*

Figure II is a plot of the *derivative* of the cumulative
number of reported probable cases of Severe Acute Respiratory
Syndrome, from March 17, 2003, to, April 22, 2003, the derivative of
logistic function least-squares-best-fit to the data, and, the
derivative of the exponential least-squares-best-fit to the data. The
*linear* least-squares-best-fit of derivative of the the data
is included for comparison. The figure plots the number of new cases
of SARS per day. Of interest is the divergence between the exponential
and logistic functions. The logistic fit appears to provide the best
fit, being relatively accurate on day 3, with about a 25% error in the
number of new cases reported on day 33, (actually, predicting
high.)

If the mechanism of the spread of SARS has exponential
characteristics, then the deterministic mechanism would be a
*geometric* progression:

**
***tsfraction* *sars* > *sars.fraction*
*tslsq* -l *sars* | *tsfraction* > *LSQ-Logistic.fraction*
*tslsq* -e *sars* | *tsfraction* > *LSQ-Exponential.fraction*

Figure III is a plot of the *fractional increase* in the
cumulative number of reported probable cases of Severe Acute
Respiratory Syndrome, from March 17, 2003, to, April 22, 2003, the
fractional increase of logistic function least-squares-best-fit to the
data, and, the fractional increase of the exponential
least-squares-best-fit to the data. Of interest is the difference
between the logistic fit and exponential fit-if, and only if, the
spread of SARS is a geometric progression, the plot would be a
straight line. The logistic fit is relatively accurate on day 5 and
33.

The spread of SARS does seem to be a deterministic progression that is not exponential:

**
***tsdeterministic* *sars* > *sars.deterministic*
*tslsq* -l *sars* | *tsdeterministic* > *LSQ-Logistic.deterministic*
*tslsq* -e *sars* | *tsdeterministic* > *LSQ-Exponential.deterministic*

Figure IV is a plot the *deterministic mechanism* of the
cumulative number of reported probable cases of Severe Acute
Respiratory Syndrome, from March 17, 2003, to, April 22, 2003, the
deterministic mechanism of logistic function least-squares-best-fit to
the data, and, the deterministic mechanism of the exponential
least-squares-best-fit to the data. The figure plots how the spread of
SARS proceeds, one day to the next. The spread of SARS does seem to be
deterministic.

**
***tslsq* -l -p *sars*
8951.016153 / (1 + e^(-(-3.223258 + 0.096215t)))
*tslsq* -c 99000 -f 12 -l -p *sars*
95718.957560 / (1 + e^(-(-5.580983 + 0.081260t)))
*tslsq* -e -p *sars*
e^(5.887387 + 0.080080t)

Figure V is a plot of the cumulative number of reported probable
cases of Severe Acute Respiratory Syndrome, from March 17, 2003, to,
April 22, 2003, the logistic function least-squares-best-fit to the
data, and the logistic function least-squares-best-fit to the data
with its *algorithmic convergence* modified. The algorithmic
convergence was altered to force the

program's best fit mechanism to accommodate slight data errors,
producing a *tslsq**maximal* least-squares-best-fit to the data; it is
the maximal logistic function solution that can be supported by the
data, (its actually the solution just before the convergence algorithm
goes unstable, and has to found by iteration.) For comparison, the
exponential least-squares-best-fit to the data is, again, included. Of
interest is the discrepancy between the the logistic
least-squares-best-fit and the logistic maximal
least-squares-best-fit; a discrepancy of an order of magnitude in the
final cumulative number of SARS cases, world wide.

What the analysis says is that we do not have data over a long enough time to make any reliable assessments about the spread of SARS, world wide. However, assuming the data is accurate, (and that's questionable at this time, too,) the best estimate is that the cumulative number of SARS cases, world wide, will begin to slow in about two months, with about 10,000 cumulative cases. However, this estimate may be low by an order of magnitude, (or even more, if the data used in the analysis is unreliable or inaccurate.)

[1] *Predictions*,
Theodore Modis, Simon & Schuster, New York, New York, 1992, ISBN
0-671-75917-5, pp. 97-105.

-- John Conover, john@email.johncon.com, http://www.johncon.com/

Copyright © 2003 John Conover, john@email.johncon.com. All Rights Reserved. Last modified: Mon Jul 29 12:29:27 PDT 2002 $Id: 030423232715.15295.html,v 1.0 2003/11/18 19:18:06 conover Exp $