Epidemiology Article | FRONTLINE

(A good backgrounder on theories of causation, relative risk, statistical significance and, on what epidemiology can and can't tell you.)
January 1994
By Nancy A. Dreyer

THE PROCESS of determining causation and, in fact, the ultimate need to determine causation is different in law and science. In law, the goal of fairness seems to be paramount. Decisions are required, no matter whether the true causes are known or understood. In contrast, scientists have been described as "practitioners of a discipline that seeks, but never finds, absolute truth" and as people who use a "variety of criteria to evaluate data in conditions that provide less than total certainty."(1)
If lawyers and courts knew how epidemiologists look at causation and were aware of some of the methods used to provide scientific inferences, perhaps they would recognize the case for accepting the tentativeness of science and the scientific process. At a minimum, this knowledge would enhance their facility to make fair and equitable decisions.

WHAT IS EPIDEMILOGY?
Epidemiology is the science of the occurrence of human illness.(2) First and foremost, epidemiologists' work concerns human beings, not animals, which immediately relieves one of all the problems of extrapolating among species. Epidemiology is not the science of determining the cause of disease in an individual. No method in any science can determine the specific cause of a specific individual's disease.
Epidemiologists study groups and make causal inferences based on aggregate data. Their work is generally non-experimental. Much epidemiologic information comes from studies that utilize some facets of existing information to draw inferences about cause and effect. For example, to learn about the effects of residential exposure to commercial nuclear power, epidemiologists might compare the health of people who reside near a nuclear power plant with the health of those who live far from the plant. To learn about drug safety, they might compare users of a particular drug with users of other drugs approved for the same indication and compare side effects among the groups.

RELATIVE RISK
A commonly used epidemiologic measure of effect is the "relative risk," which is the ratio of the risks for exposed and unexposed people. If the relative risk is equal to one, indicating the same risk for exposed and unexposed people, then the exposure has no effect. A relative risk of two would indicate a doubling in risk for exposed individuals relative to unexposed individuals.
The emphasis in epidemiology is on the estimation of effects, and good studies include an assessment of the likelihood of measurement error and the potential for bias. Random error, often called "noise," reduces the precision of an effect estimate, or "signal." Epidemiologists account for this by creating a confidence interval around the point estimate of effect. Larger studies have less random error. Systematic errors, also known as non-random error and bias, also distort the effect estimates.(3) Good investigators must be self critical and examine how non-random error could influence their result.
In epidemiologic terms, a cause is an "act or event or state of nature"--for simplicity, it may be referred to as an exposure--"which initiates or permits, alone or in conjunction with other causes, a sequence of events, resulting in an effect."(4) In most instances, the exposure is neither necessary nor sufficient by itself to cause the disease. Rather, most causes are components of at least one sufficient cause.
Causes that inevitably produce the effect are sufficient. The disease may require some time to become evident, and during this induction period, the disease might actually be cured or the person might die. Nonetheless, the exposure will have been sufficient to cause disease. Usually diseases do not have a single cause; rather, they have a constellation of component causes that, taken together, become sufficient to cause disease. For example, exposure to the measles virus will not always cause measles, because one must have a certain susceptibility to develop the disease in fact. Thus, most causes of interest are components of sufficient causes but not sufficient in themselves. Professor Kenneth J. Rothman illustrated this with pie charts.(5) Figure 1 is a conceptual scheme for the causes of a hypothetical disease.

CHART OMITTED
It shows several patterns of sufficient cause to produce disease. If we knew that Cause I accounted for 50 percent of the disease, Cause II 30 percent and Cause III 20 percent, we would know that the etiologic fraction for component A, for example, would be 100 percent because it appears in all three models. The etiologic fraction for component B would be 80 percent because it appears in models I and II, which respectively cause 50 percent and 30 percent of disease. Etiologic fractions greater than 50 percent indicate that it is more likely than not that the exposure caused disease, and relative risks of more than two indicate etiologic fractions greater than 50 percent.
These pie charts illustrate that disease prevention can be accomplished by intervening in several places, the idea being that the disease development depends on the presence of all component causes in the constellation. Take benzene and leukemia as an example. Benzene is accepted as a cause of leukemia. In Rothman's framework, it would be considered a component of a sufficient cause. Although leukemia develops in the absence of exposure to benzene and not all people exposed to benzene develop leukemia, neither of these facts provides any evidence against a causal role for benzene. Preventing benzene exposure would prevent those leukemia cases that would have developed in the presence of benzene exposure.

EPIDEMILOGIC METHODS
Within this theoretical framework for defining causes in epidemiologic terms, what methodology do scientists use to make inferences about causes? Some of the popular methods currently in use are woefully inadequate, like consensus panels, appeals to authority, and a variety of tortuous exercises in logic.
A. Spurious Methods
Consider consensus panels, a technique regularly used by the National Institutes of Health, among others. Consensus panels consist of experts who are convened to review a body of literature and come to a conclusion, as a group, as to what causes disease and what does not. It should be obvious that the process involved in coming to consensus is far from fool-proof, just as having a jury decide a question of fact does not guarantee a fair and equitable verdict.
Because a group agrees doesn't mean there is necessarily a justification for the agreement. That is reminiscent of One Hundred Authors Against Einstein, a collection of essays published during the 1930s by German physicists, which was intended to discredit the theory of relativity. Einstein is reported to have responded, "Were my theory wrong, it would have taken but one person to show it."
Other fallacious methods for determining causality include appeal to authority and its companion, the ad hominem criticism. With the appeal to authority, the credentials of the authority are used as the grounds to accept the evidence, rather than an assessment of the evidence itself. The opposite of the appeal to authority is the ad hominem criticism, in which the criticism is based on a person's perceived lack of credentials.
Another fallacy is the tortuous argument post hoc, ergo propter hoc, the Latin phrase "after this, therefore on account of it." The common example of this fallacy is the inference that the rooster's crowing causes the sun to rise each morning. The repetition of the putative cause and effect cycle supposedly lends credence to the theory and becomes proof of its validity.(6)
The inductive method of determining causality is equally fallacious. This method follows the reasoning that "If p, then q; now q is true; therefore p is true." Bertrand Russell illustrated the post hoc fallacy of inductive reasoning: "If pigs have wings, then some winged animals are good to eat; now some winged animals are good to eat; therefore pigs have wings."(7) He used this as an example of how the so-called scientific method can lead to erroneous conclusions.
Finally, no listing of popular but fallacious methods of causal inference would be complete without including statistical significance tests. The motivation for statistical hypothesis testing comes from agriculture, where statistical significance was a basis for decision making and quality control. Experiments were designed to answer questions that called for specific actions, so the results had to be classified into discrete categories.
Litigation may have some similarities to agriculture, since the courts need to come to resolutions of disputes and do not have the "luxury of awaiting further scientific studies to approach the truth."(8) But "significance testing places too high a value on a yes-or-no answer to an oversimplified question.... The result of using significance testing as the criterion for decision making is that the focus is changed from the information presented by the observations themselves to conjecture about the role chance could have played in bringing about those observations."(9) Significance tests do not address the role of bias, which is usually the greatest concern in interpretation.
Ironically, a "non-significant" result is also supposed to be meaningful in that it would imply no effect. Instead, a p-value of 0.05 or less has come to be accepted as a primary criterion for meaningfulness. While at first blush this may seem to be scientists quibbling about minutia, over-reliance on significance testing can actually be harmful, not just misleading. As an editorial in the Annals of Internal Medicine noted:
With the focus on statistical significance, if chance seems to be a plausible explanation, then other theories are too readily discarded, regard- less of how tenable they may be. As a result, effective new treatments have often been over- looked because their effects were judged to be "not significant," despite an indication of effi- cacy in the data. Conversely, if "significance" seekers find that the results of a study are calculated as improbable on the basis of chance, then chance is often rejected as an explanation when alternative explanations are even less tenable.(10)
The point is that there are no quick and easy tests for determining causality in science. The inherent appeal of such checklists must be out-weighed by an appreciation of their dangers.
B. More Defensible Methods
Some of the less common but more defensible methods of inference include subjective probability, the Empirical Bayes approach, and conjecture and refutation. Each of these methods, however, while an advance in thinking, is still short of the mark.
1. Subjective Probability and Empirical Bayes
Subjective probability is a method in which the likelihood of causation is described in quantitative terms as a person's degree of conviction. For example, a scientist may conclude that he is 90 percent certain that exposure to a particular substance causes a disease. The probability he assigns to the likelihood of causation is arbitrary and not necessarily calculated using any established methodology. It is an estimate derived from belief. The main criticism of this approach is that beliefs can be completely subjective.
A method that is akin to subjective probability, but which is based on data, is called Empirical Bayes. In this approach, the degree of conviction is replaced with a probability that is less subjective because it is based on observation.(11)
2. Conjecture and Refutation
Karl Popper and other scientists tell us that causation is established through a process of conjecture and refutation, and that science advances only by disproofs.(12) This notion of testing and refutation also has been espoused by the legal profession. In evaluating whether "creationism" was indeed a science that should be taught in the schools, Judge William R. Overton offered five criteria to define the essential characteristics of science:(13)
1. Science is guided by natural law.
2. It has to be explanatory by reference to natural law.
3. It is testable against the empirical world.
4. Its conclusions are tentative--that is, are not necessarily the final word.
5. It is falsifiable.
"Conjecture and refutation." This phrase of Popper's represents the cornerstone of the scientific process. If you wring the rooster's neck and the sun still rises the next day, it follows that the rooster's crowing did not cause the sun to rise. Only by gathering observations that refute one or more of the competing theories can causes be distinguished from non-causes. Causality may be inferred only "when a causal theory tenaciously defies our determined efforts to challenge it in empirical competition with non-causal theories."(14)

ALL METHODS ARE FALLIBLE
The important message is that all legitimate methods of scientific inference are fallible. The decision-making mold, as in statistical significance testing and courtroom verdicts, forces choices that proper scientific inference avoids. Just because both courtrooms and significance testing aim toward decision making, it doesn't follow that significance testing is a good method for the courts.
Indeed, rational decision making can actually be premised on causal relations that are believed to be false. It may be prudent to take actions that promote safeguards against certain risks, even if their causality is in doubt, if the actions are prudent by themselves and the safeguards not over-burdensome. For example, consider the current controversy over the health risks from exposure to electromagnetic fields. Some scientists believe that exposure to low-level electrical and magnetic fields from appliances such as electric blankets and electric razors are carcinogenic. The body of evidence is hardly persuasive at this point, but it could be considered prudent to use warm blankets and old-fashioned razor blades just in case there might be some truth to the hypothesis.
The law should not look to science for hard and fast rules or checklists of items that will enable lawyers and judges to identify causation with confidence. These quick and easy schemes don't work. There is no recipe to determine causation in science other than the careful process of pursing hypothesis by evaluating competing theories and continuing to refine the process until no explanation other than causality remains tenable.
(1.)Brief amici curiae of Professors Kenneth Rothman, Noel Weiss, James Rocins and Raymond Neum, 61 U.S.L.W. 3284 (1992) hereinafter amici brief , in Daubert v. Merrell-Dow Pharmaceuticals Inc., 113 S.Ct. 2786 (1993).
(2.)KENNETH J. ROTHMAN, MODERN EPIDEMIOLOGY (Boston: Little, Brown & Co., 1986).
(3.)Bernard C.K. Choi & A. Lynn Noseworthy: Classification, Direction, and Prevention of Bias in Epidemiologic Research, 34(3) JOM 265 (1992).
(4.)Kenneth J. Rothman: Causes, 10(6) AM. J. EPIDEM. 588 (1976).
(5.)Id. at 589.
(6.)Kenneth J. Rothman, Inferring Causal Connections--Habit, Faith or Logic? in CAUSAL INFERENCE 7 (Rothman, ed., Epidemiology Resources Inc., 1988).
(7.)Bertrand Russell, Dewey's New "Logic," in THE PHILOSOPHY OF JOHN DEWEY (P.A. Schilpp, ed., 1939), reprinted in THE BASIC WRITINGS OF BERTRAND RUSELL 620-27 (R.E. Egner & L.E. Denonn, eds., New York: Simon and Schuster, 1961).
(8.)Tom Christoffel & Stephen P. Teret, Epidemiology and the Law: Courts and Confidence Intervals, 81 AM. J. PUB. HEALTH 1661, 1665 (1991).
(9.)Amici brief, supra note 1.
(10.)Kenneth J. Rothman, Significance Questing, 105 ANN. of INT. MED. 445 (1986).
(11.)Sander Greenland & James M. Robins, Empirical-Bayes Adjustments for Multiple Comparisons Are Sometimes Useful, 2 EPIDEMIOLOGY 244 (1991).
(12.)John R. Platt, Strong Inference, 164 SCIENCE 347 (1964).
(13.)William R. Overton: Creationism in Schools: The Decision in McLean versus the Arkansas Board of Education, 215 SCIENCE 934 (1982).
(14.)Stephan F. Lanes, The Logic of Causal Inference, in CAUSAL INFERENCE, supra note 6, at 72.

FRONTLINE / WGBH Educational Foundation / www.wgbh.org
web site copyright WGBH educational foundation

SUPPORT PROVIDED BY