Significance testing introduction and overview

Document Type


Date of Original Version



What if there were no significance testing? This was the question faced by early researchers at the turn of the 20th century, and it has revisited us several times over the years (see chapter 5 of this volume by Mulaik, Raju, & Harshman, as well as Cowles & Davis, 1982; Huberty, 1993; and Keren & Lewis, 1993, for more on the history of significance testing). As early as 1901, Karl Pearson laid the preliminary groundwork for assessing a scientific hypothesis with sample data. Later, Ronald Fisher (1925/1941) formally proposed a set of methods that, with formulation from Jerzy Neyman and Egon Pearson (1928) on power, Type I, and Type II errors, evolved into the practice of null hypothesis significance testing (NHST). NHST was intended to provide a procedure for deciding whether the probability of getting sample results as extreme or more so than the null hypothesized value was small enough that it was less likely that it could be attributed to mere chance. Then, through the replication of such findings, along with the practice of parameter estimation, recognizing the presence of sampling error (Fisher, 1925/1941), as well as the need to quantify the effect with a correlation ratio or eta (Fisher, 1925/1941; see also Kirk, 1996), greater trust could be placed on the practical significance (see Kirk, 1996, for an excellent delineation of statistical vs. practical significance) of the scientific hypothesis. NHST was intended to provide a method for ruling out chance, thus helping to build strong evidence in favor of one or more alternative hypotheses, rather than provide an indication of the proof or probability of these hypotheses (see chapters 11-12 by Pruzek and Rindskopf for discussion on how Bayesian methods can address these probabilities). Neyman and Pearson’s work helped to fine tune the NHST method, recognizing the possibility of errors that could be made, depending on whether the null hypothesis was rejected or retained. They also introduced the concept of power, the probability of correctly rejecting the null hypothesis, foreseeing the need for quality control in the NHST procedure (see chapter 2, by Cohen, and chapter 7, by Rossi, for more discussion on the importance of attending to the power of our studies).

Publication Title

What If There Were No Significance Tests?: Classic Edition