Fisher didn’t get the criticism of Neyman and Pearson well. In response, he called their methods “childish” and “absurd learning.” In particular, Fisher opposed the idea of determining two hypotheses, rather than calculating the “importance” of available evidence, as he proposed. The decision was final, but his important tests were given only tentative opinions and could be revised later. Still, Fisher’s appeal to an open scientific mind was somewhat hampered by his claim that researchers should use a 5% cutoff for “significant” P values and his claim that “fully ignore all outcomes that do not reach this level.”
It would turn into decades of ambiguity as the textbook gradually disrupts Fisher’s null hypothesis testing with Neyman and Pearson’s decision-based approach. By discussing statistical reasoning and experimental design, the subtle discussion of how evidence can be interpreted instead became a set of fixed rules students should follow.
Mainstream scientific research will now rely on simple P-value thresholds and true or phallus decisions regarding hypotheses. In the world that played this role, experimental effects existed or not. The drug either works or isn’t. Until the 1980s, it was not that major medical journals eventually began to be free from these habits.
Ironically, many of the shifts can be traced back to ideas created by Neyman in the early 1930s. As the economy struggles with Great Repression, he noticed an increasing demand for statistical insight into the lives of the population. Unfortunately, the resources available for the government to study these issues were limited. Politicians wanted results in months or weeks, but there was not enough time or money for a comprehensive study. As a result, statisticians had to rely on sampling a small subset of the population. This was an opportunity to develop some new statistical ideas. Suppose you want to estimate a specific value, such as the proportion of the population with children. If we randomly sample 100 adults and none of them are parents, what does this suggest about the whole country? It cannot be clearly stated that no one has children. Because if you sampled another group of 100 adults, you might find some parents. Therefore, there is a need for a way to measure how you should be confident about your estimate. This is where Neymann’s innovation emerged. He showed that the “confidence interval” of the sample can be calculated.
Confidence intervals can be a slippery concept given the need to interpret concrete real-life data by imagining that many other virtual samples have been collected. Like these Type I and Type II errors, Neyman’s confidence intervals address important questions in ways that often confuse students and researchers. Despite these conceptual hurdles, it is worth it to have measures that can capture uncertainty in research. In many cases, focusing on a single average value is fascinating, especially in media and politics. A single value may feel more confident and accurate, but ultimately it is a fantastic conclusion. In some of our official epidemiological analyses, my colleagues and I chose to report only confidence intervals to prevent false attention from falling to certain values.
Since the 1980s, medical journals have focused more on confidence intervals than independent true or phallus claims. However, breaking habits can be difficult. The relationship between confidence intervals and p-values was not helpful. Our null hypothesis is that treatment has zero effect. If the estimated 95% confidence interval of the effect does not contain zero, the p-value will be less than 5%, rejecting the null hypothesis based on Fisher’s approach. As a result, medical papers are less interested in the intervals of uncertainty themselves, and instead are less interested in whether they do or are more interested in values that they do. Medicine may be moving beyond Fisher, but the effect of his arbitrary 5% cutoff remains.
Excerpts from are adapted Proof: The Uncertain Science of Certification, By Adam Kucharski. A profile book was published in the UK on March 20th, 2025.