Simpson's paradox is a well-known problem for statisticians. The paradox is based on the observation that findings that apply to each of two data sets may be reversed when the two data sets are combined.
One of the most famous examples of Simpson's paradox was demonstrated in the 1973 Berkeley gender bias study RbicaR. A preliminary review of admissions data indicated that women had a lower admissions rate than men:
Men Number of applicants.. 8,442 Percent applicants admitted.. 44% Women Number of applicants.. 4,321 Percent applicants admitted.. 35%A nearly 10% difference is highly significant, but what does it mean? Was the admissions office guilty of gender bias?
A closer look at admissions department-by-department showed a very different story. Women were being admitted at higher rates than men, in almost every department. The department-by-department data seemed incompatible with the combined data.
The explanation was simple. Women tended to apply to the most popular and oversubscribed departments, such as English and History, that had a high rate of admission denials. Men tended to apply to departments that the women of 1973 avoided, such as mathematics, engineering and physics. Men tended not to apply to the high occupancy departments that women preferred. Though women had an equal footing with men in departmental admissions, the high rate of women rejections in the large, high-rejection departments, accounted for an overall lower acceptance rate for women at Berkeley.
Simpson's paradox demonstrates that data is not additive. It also shows us that data is not transitive; you cannot make inferences based on subset comparisons. For example in randomized drug trials, you cannot assume that if drug A tests better than drug B, and drug B tests better than drug C, then drug A will test better than drug C (1). When drugs are tested, even in well-designed trials, the test populations are drawn from a general population specific for the trial. When you compare results from different trials, you can never be sure whether the different sets of subjects are comparable. Each set may contain individuals whose responses to a third drug are unpredictable. Transitive inferences (i.e., if A is better than B, and B is better than C, then A is better than C), are unreliable.
- Jules Berman (copyrighted material)
key words: data science, irreproducible results, complexity, classification, ontology, ontologies, classifications, data simplification, jules j berman
Baker SG, Kramer BS. The transitive fallacy for randomized trials: If A bests B and B bests C in separate trials, is A better than C? BMC Medical Research Methodology 2:13, 2002