(A good backgrounder on theories of causation, relative risk, statistical
significance and, on what epidemiology can and can't tell you.)
January 1994
By Nancy A. Dreyer
THE PROCESS of determining causation and, in fact, the ultimate need to
determine causation is different in law and science. In law, the goal of
fairness seems to be paramount. Decisions are required, no matter whether the
true causes are known or understood. In contrast, scientists have been described
as "practitioners of a discipline that seeks, but never finds, absolute truth"
and as people who use a "variety of criteria to evaluate data in conditions that
provide less than total certainty."(1)
If lawyers and courts knew how epidemiologists look at causation and were
aware of some of the methods used to provide scientific inferences, perhaps they
would recognize the case for accepting the tentativeness of science and the
scientific process. At a minimum, this knowledge would enhance their facility to
make fair and equitable decisions.
WHAT IS EPIDEMILOGY?
Epidemiology is the science of the occurrence of human illness.(2) First and
foremost, epidemiologists' work concerns human beings, not animals, which
immediately relieves one of all the problems of extrapolating among species.
Epidemiology is not the science of determining the cause of disease in an
individual. No method in any science can determine the specific cause of a
specific individual's disease.
Epidemiologists study groups and make causal inferences based on aggregate
data. Their work is generally non-experimental. Much epidemiologic information
comes from studies that utilize some facets of existing information to draw
inferences about cause and effect. For example, to learn about the effects of
residential exposure to commercial nuclear power, epidemiologists might compare
the health of people who reside near a nuclear power plant with the health of
those who live far from the plant. To learn about drug safety, they might
compare users of a particular drug with users of other drugs approved for the
same indication and compare side effects among the groups.
RELATIVE RISK
A commonly used epidemiologic measure of effect is the "relative risk,"
which is the ratio of the risks for exposed and unexposed people. If the
relative risk is equal to one, indicating the same risk for exposed and
unexposed people, then the exposure has no effect. A relative risk of two
would indicate a doubling in risk for exposed individuals relative to unexposed
individuals.
The emphasis in epidemiology is on the estimation of effects, and good
studies include an assessment of the likelihood of measurement error and the
potential for bias. Random error, often called "noise," reduces the precision of
an effect estimate, or "signal." Epidemiologists account for this by creating a
confidence interval around the point estimate of effect. Larger studies have
less random error. Systematic errors, also known as non-random error and bias,
also distort the effect estimates.(3) Good investigators must be self critical
and examine how non-random error could influence their result.
In epidemiologic terms, a cause is an "act or event or state of nature"--for
simplicity, it may be referred to as an exposure--"which initiates or permits,
alone or in conjunction with other causes, a sequence of events, resulting in an
effect."(4) In most instances, the exposure is neither necessary nor sufficient
by itself to cause the disease. Rather, most causes are components of at least
one sufficient cause.
Causes that inevitably produce the effect are sufficient. The disease may
require some time to become evident, and during this induction period, the
disease might actually be cured or the person might die. Nonetheless, the
exposure will have been sufficient to cause disease. Usually diseases do not
have a single cause; rather, they have a constellation of component causes that,
taken together, become sufficient to cause disease. For example, exposure to the
measles virus will not always cause measles, because one must have a certain
susceptibility to develop the disease in fact. Thus, most causes of interest are
components of sufficient causes but not sufficient in themselves.
Professor Kenneth J. Rothman illustrated this with pie charts.(5) Figure 1
is a conceptual scheme for the causes of a hypothetical disease.
CHART OMITTED
It shows several patterns of sufficient cause to produce disease. If we knew
that Cause I accounted for 50 percent of the disease, Cause II 30 percent and
Cause III 20 percent, we would know that the etiologic fraction for component A,
for example, would be 100 percent because it appears in all three models. The
etiologic fraction for component B would be 80 percent because it appears in
models I and II, which respectively cause 50 percent and 30 percent of disease.
Etiologic fractions greater than 50 percent indicate that it is more likely than
not that the exposure caused disease, and relative risks of more than two
indicate etiologic fractions greater than 50 percent.
These pie charts illustrate that disease prevention can be accomplished by
intervening in several places, the idea being that the disease development
depends on the presence of all component causes in the constellation.
Take benzene and leukemia as an example. Benzene is accepted as a cause of
leukemia. In Rothman's framework, it would be considered a component of a
sufficient cause. Although leukemia develops in the absence of exposure to
benzene and not all people exposed to benzene develop leukemia, neither of these
facts provides any evidence against a causal role for benzene. Preventing
benzene exposure would prevent those leukemia cases that would have developed in
the presence of benzene exposure.
EPIDEMILOGIC METHODS
Within this theoretical framework for defining causes in epidemiologic
terms, what methodology do scientists use to make inferences about causes? Some
of the popular methods currently in use are woefully inadequate, like consensus
panels, appeals to authority, and a variety of tortuous exercises in logic.
A. Spurious Methods
Consider consensus panels, a technique regularly used by the National
Institutes of Health, among others. Consensus panels consist of experts who are
convened to review a body of literature and come to a conclusion, as a group, as
to what causes disease and what does not. It should be obvious that the process
involved in coming to consensus is far from fool-proof, just as having a jury
decide a question of fact does not guarantee a fair and equitable verdict.
Because a group agrees doesn't mean there is necessarily a justification for
the agreement. That is reminiscent of One Hundred Authors Against Einstein, a
collection of essays published during the 1930s by German physicists, which was
intended to discredit the theory of relativity. Einstein is reported to have
responded, "Were my theory wrong, it would have taken but one person to show
it."
Other fallacious methods for determining causality include appeal to
authority and its companion, the ad hominem criticism. With the appeal to
authority, the credentials of the authority are used as the grounds to accept
the evidence, rather than an assessment of the evidence itself. The opposite of
the appeal to authority is the ad hominem criticism, in which the criticism is
based on a person's perceived lack of credentials.
Another fallacy is the tortuous argument post hoc, ergo propter hoc, the
Latin phrase "after this, therefore on account of it." The common example of
this fallacy is the inference that the rooster's crowing causes the sun to rise
each morning. The repetition of the putative cause and effect cycle supposedly
lends credence to the theory and becomes proof of its validity.(6)
The inductive method of determining causality is equally fallacious. This
method follows the reasoning that "If p, then q; now q is true; therefore p is
true." Bertrand Russell illustrated the post hoc fallacy of inductive reasoning:
"If pigs have wings, then some winged animals are good to eat; now some winged
animals are good to eat; therefore pigs have wings."(7) He used this as an
example of how the so-called scientific method can lead to erroneous
conclusions.
Finally, no listing of popular but fallacious methods of causal inference
would be complete without including statistical significance tests. The
motivation for statistical hypothesis testing comes from agriculture, where
statistical significance was a basis for decision making and quality control.
Experiments were designed to answer questions that called for specific actions,
so the results had to be classified into discrete categories.
Litigation may have some similarities to agriculture, since the courts need
to come to resolutions of disputes and do not have the "luxury of awaiting
further scientific studies to approach the truth."(8) But "significance testing
places too high a value on a yes-or-no answer to an oversimplified
question.... The result of using significance testing as the criterion for
decision making is that the focus is changed from the information presented by
the observations themselves to conjecture about the role chance could have
played in bringing about those observations."(9) Significance tests do not
address the role of bias, which is usually the greatest concern in
interpretation.
Ironically, a "non-significant" result is also supposed to be meaningful in
that it would imply no effect. Instead, a p-value of 0.05 or less has come to be
accepted as a primary criterion for meaningfulness. While at first blush this
may seem to be scientists quibbling about minutia, over-reliance on significance
testing can actually be harmful, not just misleading. As an editorial in the
Annals of Internal Medicine noted:
With the focus on statistical significance, if
chance seems to be a plausible explanation, then
other theories are too readily discarded, regard-
less of how tenable they may be. As a result,
effective new treatments have often been over-
looked because their effects were judged to be
"not significant," despite an indication of effi-
cacy in the data. Conversely, if "significance"
seekers find that the results of a study are calculated as improbable on the basis of chance,
then chance is often rejected as an explanation
when alternative explanations are even less tenable.(10)
The point is that there are no quick and easy tests for determining
causality in science. The inherent appeal of such checklists must be out-weighed
by an appreciation of their dangers.
B. More Defensible Methods
Some of the less common but more defensible methods of inference include
subjective probability, the Empirical Bayes approach, and conjecture and
refutation. Each of these methods, however, while an advance in thinking, is
still short of the mark.
1. Subjective Probability and Empirical Bayes
Subjective probability is a method in which the likelihood of causation is
described in quantitative terms as a person's degree of conviction. For
example, a scientist may conclude that he is 90 percent certain that exposure to
a particular substance causes a disease. The probability he assigns to the
likelihood of causation is arbitrary and not necessarily calculated using any
established methodology. It is an estimate derived from belief. The main
criticism of this approach is that beliefs can be completely subjective.
A method that is akin to subjective probability, but which is based on data,
is called Empirical Bayes. In this approach, the degree of conviction is
replaced with a probability that is less subjective because it is based on
observation.(11)
2. Conjecture and Refutation
Karl Popper and other scientists tell us that causation is established
through a process of conjecture and refutation, and that science advances only
by disproofs.(12) This notion of testing and refutation also has been espoused
by the legal profession. In evaluating whether "creationism" was indeed a
science that should be taught in the schools, Judge William R. Overton offered
five criteria to define the essential characteristics of science:(13)
1. Science is guided by natural law.
2. It has to be explanatory by reference to natural law.
3. It is testable against the empirical world.
4. Its conclusions are tentative--that is, are not necessarily the final
word.
5. It is falsifiable.
"Conjecture and refutation." This phrase of Popper's represents the
cornerstone of the scientific process. If you wring the rooster's neck and the
sun still rises the next day, it follows that the rooster's crowing did not
cause the sun to rise. Only by gathering observations that refute one or more of
the competing theories can causes be distinguished from non-causes. Causality
may be inferred only "when a causal theory tenaciously defies our determined
efforts to challenge it in empirical competition with non-causal theories."(14)
ALL METHODS ARE FALLIBLE
The important message is that all legitimate methods of scientific inference
are fallible. The decision-making mold, as in statistical significance testing
and courtroom verdicts, forces choices that proper scientific inference avoids.
Just because both courtrooms and significance testing aim toward decision
making, it doesn't follow that significance testing is a good method for the
courts.
Indeed, rational decision making can actually be premised on causal
relations that are believed to be false. It may be prudent to take actions that
promote safeguards against certain risks, even if their causality is in doubt,
if the actions are prudent by themselves and the safeguards not over-burdensome.
For example, consider the current controversy over the health risks from
exposure to electromagnetic fields. Some scientists believe that exposure to
low-level electrical and magnetic fields from appliances such as electric
blankets and electric razors are carcinogenic. The body of evidence is hardly
persuasive at this point, but it could be considered prudent to use warm
blankets and old-fashioned razor blades just in case there might be some truth
to the hypothesis.
The law should not look to science for hard and fast rules or checklists of
items that will enable lawyers and judges to identify causation with confidence.
These quick and easy schemes don't work. There is no recipe to determine
causation in science other than the careful process of pursing hypothesis by
evaluating competing theories and continuing to refine the process until no
explanation other than causality remains tenable.
(1.)Brief amici curiae of Professors Kenneth Rothman, Noel Weiss, James
Rocins and Raymond Neum, 61 U.S.L.W. 3284 (1992) hereinafter amici brief , in
Daubert v. Merrell-Dow Pharmaceuticals Inc., 113 S.Ct. 2786 (1993).
(2.)KENNETH J. ROTHMAN, MODERN EPIDEMIOLOGY (Boston: Little, Brown & Co.,
1986).
(3.)Bernard C.K. Choi & A. Lynn Noseworthy: Classification, Direction, and
Prevention of Bias in Epidemiologic Research, 34(3) JOM 265 (1992).
(4.)Kenneth J. Rothman: Causes, 10(6) AM. J. EPIDEM. 588 (1976).
(5.)Id. at 589.
(6.)Kenneth J. Rothman, Inferring Causal Connections--Habit, Faith or Logic?
in CAUSAL INFERENCE 7 (Rothman, ed., Epidemiology Resources Inc., 1988).
(7.)Bertrand Russell, Dewey's New "Logic," in THE PHILOSOPHY OF JOHN DEWEY
(P.A. Schilpp, ed., 1939), reprinted in THE BASIC WRITINGS OF BERTRAND RUSELL
620-27 (R.E. Egner & L.E. Denonn, eds., New York: Simon and Schuster, 1961).
(8.)Tom Christoffel & Stephen P. Teret, Epidemiology and the Law: Courts and
Confidence Intervals, 81 AM. J. PUB. HEALTH 1661, 1665 (1991).
(9.)Amici brief, supra note 1.
(10.)Kenneth J. Rothman, Significance Questing, 105 ANN. of INT. MED. 445
(1986).
(11.)Sander Greenland & James M. Robins, Empirical-Bayes Adjustments for
Multiple Comparisons Are Sometimes Useful, 2 EPIDEMIOLOGY 244 (1991).
(12.)John R. Platt, Strong Inference, 164 SCIENCE 347 (1964).
(13.)William R. Overton: Creationism in Schools: The Decision in McLean
versus the Arkansas Board of Education, 215 SCIENCE 934 (1982).
(14.)Stephan F. Lanes, The Logic of Causal Inference, in CAUSAL INFERENCE,
supra note 6, at 72.