Title:
Bioinformatic approach to disease diagnosis
United States Patent: 8,005,627
Issued: August 23, 2011
Inventors: Porwancher;
Richard (Princeton, NJ)
Appl. No.: 11/852,283
Filed: September 8, 2007


Patheon

Abstract
A multivariate diagnostic method based on
optimizing diagnostic likelihood ratios through the effective use of
multiple diagnostic tests is disclosed. The NeymanPearson Lemma provides
a mathematical basis to produce optimal diagnostic results. The method can
comprise identifying those tests optimal for inclusion in a diagnostic
panel, weighting the result of each component test based on a multivariate
algorithm described below, adjusting the algorithm's performance to
satisfy predetermined specificity criteria, generating a likelihood ratio
for a given patient's test results through said algorithm, providing a
clinical algorithm that estimates the pretest probability of disease based
on individual clinical signs and symptoms, combining the likelihood ratio
and pretest probability of disease through Bayes' Theorem to generate a
posttest probability of disease, interpreting that result as either
positive or negative for disease based on a cutoff value, and treating a
patient for disease if the posttest probability exceeds the cutoff value.
Description of the
Invention
BACKGROUND OF THE INVENTION
The present invention relates to methods for constructing multivariate
predictive models to diagnose diseases for which current test methods are
considered inadequate in either sensitivity or specificity. In particular,
the present invention relates to predictive models for diagnosing diseases
with a combination of laboratory tests, generating specificities of at
least 80%.
More particularly, the present invention relates to the construction of a
multivariate predictive model for diagnosing Lyme disease (LD) by choosing
the best tests from among those currently available, utilizing the raw
data produced by these tests instead of the manufacturers' binary test
results, combining the test values into a single score through a special
statistical function, weighting the importance of each component of the
function when producing the score, generating a likelihood ratio from each
patient's score, determining the pretest probability of disease through a
special algorithm utilizing individual clinical signs and symptoms,
combining the likelihood ratio with the pretest probability of disease
through Bayes' Theorem to produce a posttest probability of disease, and
determining a posttest probability cutoff point through a prospective
validation study of the multivariate predictive model, against which
individual patients' test results can be interpreted as indicative Lyme
disease or not. The present invention also relates to component laboratory
tests identified by the predictive model as critical for diagnosis in the
form of test kits with the test panel components incorporated into a
microtiter plate to be analyzed by a commercial laboratory.
Since the discovery that the spirochete Borrelia burgdorferi was the cause
of LD over 25 years ago, numerous tests have been developed to detect this
organism. Direct cultures of tissue or body fluids are possible, but
suffer from low sensitivity. Direct detection methods involve assays for a
component of B. burgdorferi or the DNA itself. Most PCR tests for B.
burgdorferi DNA are insensitive, such as plasma, serum, whole blood,
urine, and spinal fluid. Although invasive, arthrocentesis and skin
biopsies often detect DNA by PCR in acute cases, aiding diagnosis.
Performing skin biopsies is unnecessary under most circumstances because a
welltrained physician can usually diagnose the characteristic rash,
erythema migrans, by visual inspection alone.
Patients presenting with neurological symptoms or chronic arthritic
symptoms will usually not benefit from PCR tests for B. burgdorferi DNA.
In the latter cases, serological tests for antibody for B. burgdorferi are
commonly used. Numerous methods have been employed, including wholecell
EIA, captureEIA, peptideantigen EIA, recombinant protein EIA,
immunofluorescent antibody, immunodot, and immunoblots to detect IgG, IgM,
and IgA antibodies. All serological methods may lead to falsepositive
results; however the most common test for B. burgdorferi antibody, the
wholecell EIA, is particularly susceptible to falsepositive results.
Therefore the CDC has advised a two step process to confirm antibody:
first test serum by wholecell EIA or an equivalent method, then use a
highly specific immunoblot to confirm those results positive or
indeterminate by the first step.
Most antibody methods are insensitive early in the disease (<4 weeks), but
become more sensitive after the first few weeks have passed. This lack of
sensitivity for early disease and a high rate of falsepositive serology
have undermined public confidence in the twostep process. The CDC and NIH
have conducted active research programs for better diagnostic tests. The
most promising of these new tests have been the recombinant and
peptideantigen EIAs; these tests exhibit sensitivity and specificity
similar to the prior twostep process, but embodied in a single test.
The concept of a single test is the most appealing and some experts have
advocated using C6 IgG as an alternative to the twostep method. The lack
of sensitivity in early disease persists (at least 40% falsenegative
rate) with this new generation of tests (including C6 IgG), leading to
recommendations for alternative interpretive algorithms by some physicians
and Lyme advocacy groups. Western immunoblots using alternative
interpretive algorithms (Donta, Clin. Infect. Dis., 25 (Suppl. 1), S5256
(1997)) have demonstrated better sensitivity, but much worse specificity
(up to 40% falsepositives). This tradeoff between sensitivity and
specificity is a well recognized limitation in diagnostic testing.
The use of multiple tests in combination is not new. The twostep
algorithm is borrowed from the literature on syphilis and HIV testing: a
sensitive but nonspecific screening test is confirmed by a more specific
test. Implicit in this paradigm is the knowledge that the second,
confirmatory test is at least as sensitive as the screening test. This
analogy breaks down for LD (Trevejo et al., J. Infect. Dis., 179(4), 9318
(1999). The Western blot, though specific, is not as sensitive for early
disease as the EIA test. The improved specificity of the twostep method
is offset by limited sensitivity.
Tests are used in combination to gain either sensitivity or specificity;
interpretive rules are usually generated through Boolean operators. If the
"OR" operator is used, then a combination test is positive if either
component is positive. If each component detects a different antigenic
epitope of B. burgdorferi, then a test fashioned using the "OR" operator
will likely be more sensitive than any individual component. However, each
new component also has its own intrinsic rate of falsepositive reactions.
Overall false positive rates increase linearly when using the "OR"
operator combinations (Porwancher, J. Clin. Microbiol. 41(6), 2791
(2003)). If the "AND" operator is used, then a test is positive only when
both components are positive; this operator is used to improve the
specificity of a given combination of tests, often at the expense of
sensitivity.
When using the "AND" operator, a counterintuitive event may occur:
additional antigens can be used to improve specificity without loss of
sensitivity. This effect has been demonstrated for ElpB1 and OspE; when
FlaB and OspC were added to the mix; requiring multiple antibody responses
actually improved specificity from 89% to 98%, while maintaining
sensitivity (Porwancher, J. Clin. Microbiol., (2003)). Sensitivity was
maintained because there were 15 new ways for antibody combinations to
form when two new antigens were added; patients with disease tend to have
multiple positive antibody combinations. Specificity improved because
falsepositive combinations are rare, even though there are more ways for
these to form.
Bacon et al., J. Infect. Dis., 187, 11871199 (2003) evaluated using two
peptide or recombinant antigens together in binary form and assigned equal
importance to antibodies generated by either antigen. The authors used the
Boolean "OR" operator, evaluating several different antibody combinations
and settled on two pairs of antibodies for diagnosis, either C6 IgG and
pepC10 IgM or V1sE1 IgG and pepC10 IgM. While the 2tier method using a
VIDAS wholecell EIA was included, no other recombinant antigens were
evaluated. By limiting the choice of antigens and not weighting the ones
that are included, this method compromises test performance.
Western blots are basically multiple binary test observations: a band is
formed when antibody and antigen mix together in a clear electrophoretic
gel, creating a visible line. Antibody is either observed or not. Of the
10 key antibodies detected by IgG Western blot, we do not know which
antibody results contribute independent information to diagnosis. Nor is
the information weighted according to its level of importance; all
positive components are weighted the same. Failing to weight the
importance of individual bands might have led to requiring an excessive
number of bands to confirm disease, thus limiting sensitivity.
Honegr et al., Epidemiol. Mikrobiol. Immunol., 50(4), 147156 (2001),
interpreted Western blots using logistic regression analysis. While
directed toward human diagnosis, the study tried to determine the optimal
use of different species of B. burgdorferi to utilize in European tests,
as well as determine interpretive criteria. Band results reported in
binary fashion were used to create a quantitative rule; however, no
likelihood ratios were reported from this regression technique, no partial
ROC areas were maximized using the logistic method [as in McIntosh and
Pepe (2002)], there were no specificity goals for ROC areas, and there was
no attempt to utilize clinical information. While key Western blot bands
were identified, and weighted, the failure to use clinical information,
set specificity goals, or to maximize likelihood ratios (and therefore
partial ROC areas) raises a question about the validity of the rules that
were derived (according to the NeymanPearson Lemma).
Robertson et al., J. Clin. Microbiol., 38(6), 20972102 (2000), performed
a study whose purpose was similar to Honegr et al. However Robertson et
al. did not produce a quantitative rule as a consequence of utilizing
multiple Western blot bands. While significant bands were identified
through logistic regression, they utilized this information in a binary
fashion and generated interpretive rules using either two or three of the
bands so identified. There was no attempt to weight the importance of
individual bands. In the end, the purported rules developed by logistic
regression were no better than preexisting interpretive criteria. No
likelihood ratios were generated, no ROC curves, and no clinical
information was utilized. There was no attempt to use the Western blot
with other tests. Their failure to quantify their results severely limited
its use.
Guerra et al., J. Clin. Microbiol., 38(7), 26282632 (2000), studied the
use of loglikelihood analysis of Western blot data in dogs. The emphasis
of her study was to develop a rule to diagnose Lyme disease in dogs that
had received the Lyme disease vaccine (known to interfere with diagnosis).
Guerra did produce a quantitative rule based on likelihood ratios. She
combined this rule with epidemiological data to generate posttest
probabilities. None of the animals were sick. No ROC analysis was
performed, nor was there an attempt to determine the specificity or
sensitivity of the technique. While a predictive rule could be generated,
its performance was unclear because the epidemiological data was poorly
utilized.
As demonstrated above, the LD field is limited by the lack of a
theoretical basis for test strategy. There has been remarkably little work
done using multivariate analysis and Lyme disease. Multiple tests exist to
diagnose LD, but little is known about which tests are optimal or how to
use tests together to enhance diagnostic power. U.S. Pat. No. 6,665,652
described an algorithm that enabled diagnosis of LD using multiple
simultaneous immunoassays; this method required that the antibody response
to antigens selected for diagnostic use be highly associated with LD (i.e.
few falsepositive results) and conditionally independent among controls.
The disclosure of the above patent, particularly as it relates to LD
diagnosis, is incorporated herein by reference.
Diagnostic methods are usually compared based on misclassification costs
(utility loss), a value tied to the prevalence of LD in the general
population. While the dollar cost of diagnostic tests is one means to
compare outcomes, another and possibly more important goal is to estimate
the loss of productive life (regret) from a given outcome. The two factors
that generate regret are falsenegative and falsepositive serology.
The cost associated with falsenegative results is the difference in
regret between those with falsenegative and truepositive serology, for
which the increased personal, economic, and social cost of delaying
disease treatment are factors. The cost associated with falsepositive
results is the difference in regret between those with falsepositive and
truenegative serology, for which the personal, economic, and social costs
of administering the powerful intravenous antibiotics to healthy patients
are all factors.
The foregoing issues also exist for many other infectious and
noninfectious diseases. There remains a need for a predictive model that
enables the selection of the fewest number of tests that contribute
significantly to disease diagnosis, thereby limiting the cost of testing
without sacrificing diagnostic sensitivity.
SUMMARY OF THE INVENTION
This need is met by the present invention. A multivariate diagnostic
method based on optimizing diagnostic likelihood ratios through the
effective use of multiple diagnostic tests is proposed. The NeymanPearson
Lemma (Neyman and Pearson, Philosophical Transactions of the Royal Society
of London, Series A, 231, 289337 (1933)) provides a mathematical basis
for relying on such methods to produce optimal diagnostic results. When
individual diagnostic tests for a disease prove inadequate in terms of
either sensitivity or specificity, the present invention provides a method
for combining existing tests to enhance performance.
The method includes the steps of: identifying those tests optimal for
inclusion in a diagnostic panel, weighting the result of each component
test based on a multivariate algorithm described below, adjusting the
algorithm's performance to satisfy predetermined specificity criteria,
generating a likelihood ratio for a given patient's test results through
said algorithm, providing a clinical algorithm that estimates the pretest
probability of disease based on individual clinical signs and symptoms,
combining the likelihood ratio and pretest probability of disease through
Bayes' Theorem to generate a posttest probability of disease, interpreting
that result as either positive or negative for disease based on a cutoff
value, and treating a patient for disease if the posttest probability
exceeds the cutoff value.
Therefore, according to one aspect of the present invention, a method is
provided for constructing a multivariate predictive model for diagnosing a
disease for which a plurality of test methods are individually inadequate,
wherein the method includes the following steps:
(a) performing a panel of laboratory tests for diagnosing said disease on
a test population including a statistically significant sample of
individuals with at least one objective sign of disease and a
statistically significant control sample of healthy individuals and
persons with crossreacting medical conditions;
(b) generating a score function from a linear combination of the test
panel results, wherein the linear combination is expressed as .beta..sup.TY,
wherein D is the disease; Y.sub.1, . . . , Y.sub.k is a set of K
diagnostic tests for D; Y is a vector of diagnostic test results {Y.sub.1,
. . . , Y.sub.k}; D'=not D; .beta. is a vector of coefficients
{.beta..sub.1, . . . , .beta..sub.k} for Y; and .beta..sup.T is the
transpose of .beta.;
(c) performing a receiver operating characteristic (ROC) regression or
alternative regression technique of the score function, wherein the test
panel is selected and .beta. coefficients are calculated simultaneously to
maximize the area under the curve (AUC) of the empiric ROC as approximated
by
 see Original Patent.
(d) calculating for each individual the
pretest odds of disease; generating a diagnostic likelihood ratio of
disease by determining the frequency of each individual's test score in
said diseased population relative to said control population; and
multiplying the pretest odds by the diagnostic likelihood ratio to
determine the posttest odds of disease for each individual;
(e) converting a set of posttest odds into posttest probabilities and
creating an ROC curve by altering the posttest probability cutoff value;
(f) comparing the ROC areas generated by one or more regression techniques
to determine an optimal methodology comprising the tests to be included in
an optimum test panel and the weight to be assigned each test score alone
or in combination;
(g) dichotomizing the optimal methodology by finding that point on the
final ROC graph tangent to a line with a slope of (1p)C/pB, where p is
the population prevalence of disease, B is the regret associated with
failing to treat patients with disease and C is the regret associated with
treating a patient without disease, thereby generating a posttest
probability cutoff value; and
(h) displaying the optimum test panel for disease diagnosis, the weight
each individual test score is to be assigned alone or in combination, and
the cutoff value against which positive or negative diagnoses are to be
made.
When t.sub.0 is the maximum falsepositive rate desired by a physician
interpreting the tests and is a multiple of 1/n.sup.H; then the .beta.
coefficients and test panel are chosen simultaneously through partial ROC
regression in order to generate the largest area below the partial ROC
curve for the (1t.sub.0) quantile of individuals without D, where .beta..sup.T
Y.sub.j>c and S.sub.H(c)=t.sub.0 (the survival function of patients
without disease with a score of c). When several predictive models are
under consideration, their partial AUC for the (1t.sub.0) quantile are
compared with that produced by partial ROC regression in order to
determine the optimal technique (Dodd and Pepe, 2003).
Methods according to the present invention further include the steps of
testing individual patient serum samples using the optimum methodology;
reporting the diagnostic result to each patient's physician and treating
those patients whose posttest probability exceeds the cutoff value for
disease D. When the posttest probability falls below the cutoff value, but
the illness is less than 2 weeks duration, the test should be repeated in
14 days in order to look for seroconversion.
Pretest risk can be determined using an individual's clinical signs and
symptoms. In the event that there is insufficient data to determine the
pretest risk that a patient has Lyme disease, then the laboratory may
report the likelihood ratio for that patient's test results directly to
the physician, as well as the cutoff value to distinguish positive from
negative results. A diagnostic cutoff can be determined by observing the
likelihood ratio which results in 99% specificity in a control population
of patients.
The present invention has also identified significant roles for pepC10 IgM,
V1sE1 IgG and C6 IgG antibodies in the diagnosis of LD, in combination
with one other or with different antibodies. The present invention
therefore also includes a test panel comprising a plurality of antibody
tests, kit and methods for the detection of LD including one or more of
these additional antibodies.
A computerbased method is also provided for diagnosing a disease for
which a plurality of test methods are individually inadequate, which
method includes the steps of combining weighted scores from a panel of
laboratory test results chosen through the multivariate techniques
described above, comparing the combined weighted results to a cutoff
value, and diagnosing and treating a patient for disease D based on
exceeding the cutoff level. The disease D can be Lyme disease.
Computerbased methods include methods evaluating results from a test
panel including at least one antigen test selected from V1sE1 IgG, C6 IgG,
and pepC10 IgM antigen tests.
The inventive method reduces error because specificity requirements are
satisfied; this requirement is particularly important for LD because of
overdiagnosis and overtreatment for falsepositive results. When the
disease is LD, the tests chosen by the proposed method may be employed by
the algorithm described in U.S. Pat. No. 6,665,652 after being
dichotomized. Alternatively, these tests can be directly utilized by new
methodologies for LD prediction.
Alternative multivariate methods, including but not limited to logistic
regression, loglikelihood regression, linear regression, and discriminant
analysis, can learn which features are optimal from ROC regression
methods. The learning process is particularly valuable for diseases where
high specificity is needed. These alternative methods cannot focus their
regression methodology on a portion of the ROC curve. By learning the
optimal test choices, they can rerun the regression analysis using these
specific variables, thus maximizing their predictive power.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
The LD field is limited by the lack of a theoretical basis for test
strategy. Signal detection theory provides a theoretical basis to create
rules to both include and weight the contribution of different tests. The
likelihood ratio for a given set of test results is the probability that
those results will be seen in patients with disease, divided by the
probability that those same set of results will be seen in patients
without disease. The NeymanPearson lemma (1933) states that the algorithm
that produces the highest likelihood ratio for a given specificity is the
optimal interpretive algorithm. This mathematical statement leads us to
search for methods that will maximize the diagnostic likelihood ratio
derived from a given set of tests.
ROC regression methods are the optimal methods to maximize likelihood
ratios. (Pepe, The Statistical Evaluation of Medical Tests for
Classification and Prediction, (First Edition ed. Oxford, U.K., Oxford
University Press, 2003); Ma and Huang, Regularized ROC Estimation: With
Applications to Disease Classification Using Microarray, (University of
Iowa, Department of Statistics and Actuarial Science, Technical Report No.
345, 2005)). ROC curves are generated by varying the score cutoff values
generated using a specific algorithm for a given set of tests. Sensitivity
and specificity results follow from producing such cutoff values. An ROC
curve quantifies the tradeoff between sensitivity and specificity. It is
not well known that the derivative of the ROC curve at any given
specificity level is the likelihood ratio for that test cutoff value.
Therefore ROC curves are, in essence, reflections of the likelihood ratio
associated with a given set of test results. ROC regression methods
attempt to maximize the ROC curve at each point (maximizing the likelihood
ratio for each test cutoff value). Therefore ROC techniques are able to
produce the optimal rules for any given set of test results.
Regression techniques are approximations; for ROC regression, the
approximation is to the empiric ROC curve. The empiric AUC (area under the
curve) represents the optimal solution for a given set of tests. For large
studies using multiple test results and covariates, the solution to the
empiric ROC requires near impossible calculation power. Therefore
approximation methods are needed. (Ma and Huang, 2005; McIntosh and Pepe,
Biometics, 58 657664 (2002)). One of the best methods is the sigmoid
function approximation to the empiric ROC curve (Ma and Huang, 2005).
Partial ROC regression maximizes the ROC curve within clinically
acceptable limits of specificity (usually 95% to 100%).
While logistic regression can attempt to approximate the empiric ROC curve
over the entire ROC space, only partial ROC regression is able to maximize
a portion of the curve; the clinical impact of this nuance is that partial
ROC regression using a sigmoid function is better at choosing tests that
produce high levels of specificity, while maintaining sensitivity.
Penalized likelihood functions may also be employed using the LASSO
technique with an L.sub.1 penalty to choose the best tests among highly
correlated methods. (Kim and Kim, "Gradient LASSO for feature selection,"
Proceedings of the 21.sup.st Internation Conference on Machine Learning,
Banff, Canada, (2004)). By optimizing the number of tests, the specific
tests chosen, and the rules used to combine those tests, it is possible to
maximize the likelihood ratio at each point of the partial ROC curve.
Logistic regression using a loglikelihood method provides a good
approximation to the empiric ROC curve, though imperfect in areas
requiring high specificity (McIntosh and Pepe, 2002); good agreement has
been demonstrated between loglikelihood and ROC methods for the CDC
dataset (Bacon et al. 2003) used to confirm the inventive methodology.
Regardless of the value of logistic methods using small sample sizes,
picking the correct variables for evaluation of large samples is critical
for performance reasons and cost (Pepe and Thompson, Biostatist., 1(2),
123140 (2000)).
Partial ROC regression is theoretically superior to logistic regression
because of its inherent ability to maximize a portion of the ROC curve.
Because logistic regression methods are computationally easier and because
of the need to compare multiple predictive models, logistic methods were
chosen for the remainder of our analyses. (McIntosh and Pepe 2002).
However, the above theoretical reasons predict that for some data sets,
ROC regression will produce superior results, either by picking better
tests or by using more efficient rules to maximize the critical portion of
the ROC curve.
It is not sufficient to choose other regression methods that might produce
results superior to current twostep techniques. Rather the ability to
choose the best antigens is key, both from a therapeutic and cost
perspective. The present invention helps other regression methods learn
the correct antigens to use to achieve specificity and sensitivity goals,
allowing them to recalibrate more accurately. Both because of
theoretically superior overall performance and the ability to improve
other techniques, partial ROC regression using a sigmoid approximation and
penalized likelihood functions is an optimal means to both choose tests
and produce optimal rules to combine tests. Techniques like logistic
regression can utilize those features (variables) selected by partial ROC
methods to optimize its selection of beta coefficients, thereby enhancing
its predictive power.
Rules based on likelihood ratios produce outputs that can be easily
combined with pretest probability results through Bayes' Theorem. By
multiplying the pretest odds times the likelihood ratio, one generates the
posttests odds, specific to that patient and their test results. The
present invention uses an algorithm to determine the pretest probability
of disease based on the signs and symptoms of disease. The method
described in U.S. Pat. No. 6,665,652 and a new literature review helped
formulate the estimates in Table 1 (see Original Patent). For example, the
pretest probabilities listed below can be used in to optimize prediction
of LD. Similar pretest probabilities and algorithms can be generated for
other diseases without risky experimentation.
Although it is possible to use a likelihood ratio alone to categorize
patients as having disease or not, combining clinical and laboratory
results has demonstrated even more impressive performance relative to the
CDC's 2tier method. All tests seem to benefit from including information
about the pretest risk of infection, but ROC and logistic regression seem
to produce the best overall results when combined with pretest risk
assessment.
The multivariate method of the present invention is used to select the
optimum test panel for disease diagnosis, weight those results to maximize
sensitivity and specificity, and ultimately choose a cutoff value for the
posttest probability of disease that minimizes the regret associated with
false positive and false negative test results. Component laboratory tests
identified by the predictive model as critical for diagnosis can be
manufactured in the form of test kits with the test panel components
incorporated into a microtiter plate to be analyzed by a commercial
laboratory.
The laboratory will utilize reading equipment and software provided by the
present invention to collect and interpret test data, generating a
likelihood ratio for each patient. According to one embodiment of the
present invention, the commercial laboratory will electronically transfer
each patient's likelihood ratio to their physician's office, to be
received by software provided by the present invention for a computer or
personal digital assistant. The physician will then evaluate each
patient's individual signs and symptoms through a clinical algorithm on
the office software to determine the pretest probability of disease.
Should there be insufficient information to generate such a score, then
the physician may choose to accept the laboratoryderived likelihood ratio
for that patient and cutoff value as the final report.
The physician's software will combine the patient's likelihood ratio with
the pretest probability of Lyme disease as determined by the physician,
generating a posttest probability of Lyme disease. The physician's
software will generate a report, including the above results and an
interpretation of posttest probability of disease as it relates to the
cutoff level we provide. Test results exceeding the cutoff level will help
determine whether the patient requires additional tests or treatment for
Lyme disease.
The test kit containing the component tests and interpretive clinical and
laboratory software, plus the test kit reader, will be marketed as a
single test to be FDA approved.
The present invention thus also provides diagnostic software containing
code embodying a computerbased method for scoring results from the
optimum test panels according to the weights assigned each test or
combination thereof and comparing the results against the assigned cutoff
value to render a positive or negative diagnosis. Optimum test panel kits
are also provided, including kits in which the diagnostic software is
included. Methods for diagnosing disease with the test panels and software
are also provided.
The multivariate method of the present invention is performed as a
computerbased method. The input, processor and output hardware and
software other than that expressly described herein is essentially
conventional to one of ordinary skill in the art and requires no further
description. The input, processor and output hardware employed by computer
based methods for diagnosing disease constructed from information derived
by the multivariate method of the present invention are also essentially
conventional to one of ordinary skill in the art and require no further
description
The foregoing principles are illustrated in the following example in the
context of LD, however, it should be understood that the inventive method
can also be applied to other diseases for which there exists multiple
diagnostic tests such as connective tissue diseases, Rocky Mountain
Spotted Fever, Babesia microti, and Anaplasma granulocytophilia.
Diagnostic testing panels can be developed for each of the foregoing
against a test population according to the methods described herein
incorporating pretest clinical information to select the optimum test
panel for disease diagnosis, the weight to assign each test of combination
thereof, and cutoff values that minimize regret associated with false
positive and false negative results. For example, the inventive method can
be applied to a diagnostic test panel for the diagnosis of Lupus
erythematosis and the ARA diagnostic criteria for Lupus erythematosis can
be used to determine the pretest probability of disease.
Claim 1 of 15 Claims
1. A method for constructing a
multivariate predictive model for diagnosing a disease for which a
plurality of test methods are individually inadequate, said method
comprising: (a) performing a panel of laboratory tests for diagnosing said
disease on a test population comprising a statistically significant sample
of individuals with at least one objective sign of disease and a
statistically significant control sample of healthy individuals or persons
with crossreacting medical conditions; (b) generating, by a computer, a
score function from a linear combination of said test panel results, said
linear combination expressed as .beta..sup.TY , wherein D is the disease;
Y.sub.1, . . . , Y.sub.k is a set of K diagnostic tests for D; Y is a
vector of diagnostic test results {Y.sub.1, . . . , Y.sub.k}; D'=not D; .beta.is
a vector of coefficients {.beta..sub.1, . . . , .beta..sub.k}for Y; and .beta..sup.T
is the transpose of .beta.; (c) performing, by the computer, a receiver
operating characteristic (ROC) regression or alternative regression
technique of the score function, wherein the test panel is selected and
.beta. coefficients are calculated simultaneously to maximize the area
under the curve (AUC) of the empiric ROC as approximated by  see
Original Patent; (d) calculating, by the computer, for each individual the
pretest odds of disease; generating a diagnostic likelihood ratio of
disease by determining the frequency of each individual's test score in
said diseased population relative to said control population; and
multiplying said pretest odds by said likelihood ratio to determine the
posttest odds of disease for each individual; (e) converting, by the
computer, a set of posttest odds into posttest probabilities for each
methodology and creating an ROC curve for each methodology by altering its
respective posttest probability cutoff value; (f) comparing, by the
computer, the ROC areas generated by one or more regression techniques to
determine an optimal methodology, comprising the tests to be included in
an optimum test panel and the weight to be assigned each test score alone
or in combination; (g) dichotomizing, by the computer, the optimal
methodology by finding that point on the final ROC graph tangent to a line
with a slope of (1p)C/p B, where p is the population prevalence of
disease, B is the regret associated with failing to treat patients with
disease and C is the regret associated with treating a patient without
disease; thereby generating a posttest probability cutoff value; and (h)
displaying, by the computer, the optimum test panel for disease diagnosis,
the weight each individual test score is to be assigned alone or in
combination, and the cutoff value against which positive or negative
diagnoses are to be made, wherein said disease is Lyme Disease.
____________________________________________
If you want to learn more
about this patent, please go directly to the U.S.
Patent and Trademark Office Web site to access the full
patent.
