In This Article Expand or collapse the "in this article" section Simpson's Paradox in Psychology

  • Introduction
  • History of Simpson’s Paradox
  • The Role of Causality in Avoiding Simpson’s Paradox
  • Simpson’s Paradox and Its Relationship with Other Paradoxes
  • Simpson’s Paradox and Nonergodicity
  • Ways to Identify and Avoid Simpson’s Paradox
  • Some Real-Life Examples of Simpson’s Paradox
  • Simpson’s Paradox in Psychology
  • Simpson’s Paradox in Other Fields
  • Simpson’s Paradox in Meta-Analysis Studies
  • Simpson’s Paradox in Clinical Trials
  • Simpson’s Paradox Can Affect Policy Decisions

Psychology Simpson's Paradox in Psychology
by
Madhur Mangalam
  • LAST REVIEWED: 29 November 2022
  • LAST MODIFIED: 29 November 2022
  • DOI: 10.1093/obo/9780199828340-0301

Introduction

Simpson’s paradox—also called the reversal paradox and amalgamation paradox—is a statistical phenomenon in which an apparent paradox arises because aggregate data at the group level (or at the level of a set of groups) can support a conclusion that is either not observed or is opposite from that suggested by the same data before aggregation at the individual level (or at the level of groups). The paradox is resolved when the data are stratified by groups in the statistical modeling. An intuitive example of Simpson’s paradox is the correlation between typing speed and typos. At the group level, the correlation is negative—experienced typists type faster and make fewer typos. However, at the individual level, the correlation is positive—the faster an individual types, the greater the number of typos he/she makes. Thus, it would be fallacious to conclude that the relationship between typing speed and typos observed at the group level holds at the individual level. Simpson’s paradox is especially problematic in physical and social sciences, where statistical trends in point data observed at the group level are often fallaciously used to derive inferences about individuals, or relatively less often, the other way round. Hence, equivalence at the group and individual levels must be explicitly tested.

History of Simpson’s Paradox

Simpson 1951 first addressed this phenomenon, showing how combining contingency tables can yield paradoxical conclusions specifically, reporting associations that disappeared upon aggregation, although the earlier works Pearson, et al. 1899 and Yule 1903 noticed a similar phenomenon. Simpson noticed that depending on the story behind the data, the “sensible interpretation” is sometimes compatible with the aggregate population and sometimes disaggregated subpopulations. Twenty years later, Blyth 1972 found that aggregation can even lead to the sign reversal of statistical relationship, and the author labeled the phenomenon as Simpson’s paradox in honor of Simpson, although sign reversal was first noted by Cohen and Nagel 1934. Lindley and Novick 1981 amplified Simpson’s paradox by showing that no statistical criterion can warn against drawing wrong conclusions or indicate whether aggregated or disaggregated data would support the correct conclusion. Critically, Lindley and Novick highlighted that when distinct contexts compel distinct conclusions based on the same data, then our choice of the conclusion must be driven not by statistical considerations but by additional information extracted from the context; that is, it must invoke some form of causality. The interested readers can refer to some of the excellent resources that discuss statistical methods to prevent, diagnose, and treat Simpson’s paradox in statistical point estimates, such as Adolf, et al. 2014; Fisher, et al. 2018; and Kievit, et al. 2013.

  • Adolf, J., N. K. Schuurman, P. Borkenau, D. Borsboom, and C. V. Dolan. 2014. Measurement invariance within and between individuals: A distinct problem in testing the equivalence of intra- and inter-individual model structures. Frontiers in Psychology 5: 883.

    DOI: 10.3389/fpsyg.2014.00883

    Addresses the equivalence between results obtained at intra-individual and inter-individual levels of psychometric analysis in the context of a linear state-space model, i.e., a time series model with latent variables. Considers invariance constraints under which results can be generalized (i) over time within subjects, (ii) over subjects within occasions, and (iii) over time and subjects simultaneously. Relates problems of time- and subject-equivalence to problems of nonergodicity.

  • Blyth, C. R. 1972. On Simpson’s paradox and the sure-thing principle. Journal of the American Statistical Association 67.338: 364–366.

    DOI: 10.1080/01621459.1972.10482387

    Presents a primer of Simpson’s paradox in the mathematical language of probability.

  • Cohen, M., and E. Nagel. 1934. An introduction to logic and the scientific method. New York: Harcourt, Brace.

    Reports sign reversal of the relationship between two variables upon aggregation.

  • Fisher, A. J., J. D. Medaglia, and B. F. Jeronimus. 2018. Lack of group-to-individual generalizability is a threat to human subjects research. Proceedings of the National Academy of Sciences 115.27: E6106–E6115.

    DOI: 10.1073/pnas.1711978115

    An influential article stating that “statistical findings at the inter-individual (group) level generalize to the intra-individual (person) level only if the process is ergodic,” i.e., the effects remain homogeneous across individuals and stable over time. Shows that ergodicity does not hold in multiple published datasets, threatening human subjects research. Emphasizes that researchers must explicitly test for equivalence of processes at the group and individual level in social and medical sciences.

  • Kievit, R., W. Frankenhuis, L. Waldorp, and D. Borsboom. 2013. Simpson’s paradox in psychological science: A practical guide. Frontiers in Psychology 4: 513.

    DOI: 10.3389/fpsyg.2013.00513

    Reviews findings from multiple disciplines to argue that Simpson’s paradox is pretty common and typically results in incorrect interpretations with potentially harmful consequences. Shows that Simpson’s paradox is most likely to occur when drawing inferences across different levels of analysis (e.g., from populations to subgroups, or subgroups to individuals). Proposes statistical markers indicative of Simpson’s paradox, and offers psychometric solutions for dealing with it—including a toolbox in R for detecting Simpson’s paradox.

  • Lindley, D. V., and M. R. Novick. 1981. The role of exchangeability in inference. The Annals of Statistics 9.1: 45–58.

    DOI: 10.1214/aos/1176345331

    Shows that no statistical criterion would warn the investigator against drawing the wrong conclusions due to Simpson’s paradox or indicate which (subset of) data would lead to the correct conclusion.

  • Pearson, K., A. Lee, and L. Bramley-Moore. 1899. Genetic (reproductive) selection: Inheritance of fertility in man, and of fecundity in thoroughbred racehorses. Philosophical Transactions of the Royal Society: Series A 192: 257–330.

    DOI: 10.1098/rsta.1899.0006

    Presents an extensive discussion of spurious correlations in the case of continuous variables. Shows that pooling two separate records, for each of which the correlation is zero, necessarily creates a spurious correlation unless the mean of at least one of the variables is the same in the two cases.

  • Simpson, E. H. 1951. The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society: Series B (Methodological) 13.2: 238–241.

    DOI: 10.1111/j.2517-6161.1951.tb00088.x

    The most influential article on Simpson’s paradox and the one which led to the paradox being named Simpson’s paradox. Discuss how in a 2×2×2 contingency table, there may exist associations or interactions of given attributes in pairs and a second-order interaction when considering all three pairs together.

  • Yule, G. U. 1903. Notes on the theory of association of attributes in statistics. Biometrika 2.2: 121–134.

    DOI: 10.1093/biomet/2.2.121

    Probably the earliest documented discussion of Simpson’s paradox, even dating back its name, using, among numerous others, the hypothetical example of an anti-toxin which could appear to be a ‘cure’ due to a sex-related difference in mortality rates.

back to top

Users without a subscription are not able to see the full content on this page. Please subscribe or login.

How to Subscribe

Oxford Bibliographies Online is available by subscription and perpetual access to institutions. For more information or to contact an Oxford Sales Representative click here.

Article

Up

Down