"Rethinking Statistical Significance"

Thomas E. Love, Ph.D., Email: Thomas dot Love at case dot edu

Professor of Medicine, Population & Quantitative Health Sciences, Case Western Reserve University
Director of Biostatistics and Evaluation, Center for Health Care Research and Policy
Chief Data Scientist, Better Health Partnership
@ThomasELove and THOMASELOVE on Github

2022-11-16 Talk to Case Neuroscience Society

ASA 2016 Statement (pdf)
ASA 2019 Statement (pdf)
pdf of Jeff Leek and Roger Peng "P values are just the tip of the iceberg" in Nature 2015
Jeff Leek at FiveThirtyEight Finally, A Formula for Decoding Health News
Christie Aschwanden at FiveThirtyEight Not Even Scientists Can Easily Explain P-values

2018-10-10 discussion with KL2 scholars

Most of the material from the 2018-03-16 version of the talk was incorporated into the materials I used to lead a discussion on many related topics, called "Statistical Significance and Clinical Research" for KL2 scholars on 2018-10-10.

Here is a PDF of the slides from the 2018-10-10 discussion, although I expect to use about half of them in the actual conversation.
The recommended readings in advance of that discussion were:
1. Thomas LE Pencina MJ (2016) "Do Not Over (P) Value Your Research Article" JAMA Cardiology Editorial.
2. Ronald L. Wasserstein & Nicole A. Lazar (2016) The ASA's Statement on p-Values: Context, Process, and Purpose, The American Statistician, 70:2, 129-133, DOI: 10.1080/00031305.2016.1154108.
3. The Science magazine piece by Kelly Servick entitled "It will be much harder to call new findings ‘significant’ if this team gets its way"

The Brian Wansink Story

In addition to the readings below, you might to look at the following items related to the warm topic of the Brian Wansink story.

Bettina Elias Siegel, New Study Casts More Doubt on "Smarter Lunchrooms" Data, The Lunch Tray, 2017-08-16.
Anahad O'Connor, More Evidence That Nutrition Studies Don’t Always Add Up in the New York Times, 2018-09-29.
Andrew Gelman, Everyone Is Missing the Point About Brian Wansink and P-Hacking at Slate, 2018-10-08.
Andrew Gelman, Concerns about Brian Wansink’s claims and research methods have been known for years, blog post, 2018-03-03.
- "... Again, what’s stunning about this whole story is that so many of the problems were hiding in plain sight, just sitting out there for all to see, who chose to look. The strategy of affable denial worked so well for so long."
- Gelman in this post also refers to Stephanie M. Lee's article at Buzzfeed on 2018-02-23: This Controversial Ivy League Scientist Left His Kickstarter Donors High And Dry

2018-03-16 version of the Talk, at the Center for Health Care Research and Policy

The slides from the 2018-03-16 version are available in PDF and were built using R Markdown.

The video I showed at the start is Not Even Scientists can easily explain p values from FiveThirtyEight.com
Another video that you might enjoy is part of Rethinking Science's Magic Number from NOVA, in particular the video subtitled Science's most important (and controversial) number has its origins in a British experiment involving milk and tea.
The p-hacking interactive game is part of Science Isn't Broken, also at FiveThirtyEight.com

What I Used to Teach (and think I was doing well...)

Null hypothesis significance testing is here to stay.
- Learn how to present your p value so it looks like what everyone else does
- Think about "statistically detectable" rather than "statistically significant"
- Don't accept a null hypothesis, just retain it.
Use point and interval estimates
- Try to get your statements about confidence intervals right (right = just like I said it)
Use Bayesian approaches when they seem appropriate
- But look elsewhere for people to teach/do that stuff
Use simulation to help you understand non-standard designs
- But, again, look elsewhere for examples
Power is basically a hurdle to overcome in a grant application
- Retrospective power calculations are a waste of time and effort

What I Think I Think Now

Null hypothesis significance testing is much harder than I thought.
- The null hypothesis is almost never a real thing.
- Rather than rejiggering the cutoff, I would mostly abandon the p value as a summary
- Replication is far more useful than I thought it was.
Some hills aren't worth dying on.
- Think about uncertainty intervals more than confidence or credible intervals
- Retrospective calculations about Type S (sign) and Type M (magnitude) errors can help me illustrate ideas.
Which method to use is far less important than finding better data
- The biggest mistake I make regularly is throwing away useful data
- I'm not the only one with this problem.
The best thing I do most days is communicate more clearly.
- When stuck in a design, I think about how to get better data.
- When stuck in an analysis, I try to turn a table into a graph.
I have A LOT to learn.

Sources of Material in the Talk (plus a few things that didn't make the final cut)

The ASA's Statement on p-Values: Context, Process, and Purpose. ASA = American Statistical Association
- Full Citation: Ronald L. Wasserstein & Nicole A. Lazar (2016) The ASA's Statement on p-Values: Context, Process, and Purpose, The American Statistician, 70:2, 129-133, DOI: 10.1080/00031305.2016.1154108
- The paper begins with a presentation of George Cobb's argument about why p values deserve re-evaluation.
- The many supplemental materials can be found at this link.
Benjamin D et al. (2017) Redefine Statistical Significance
- It will be much harder to call findings "significant" if this team gets its way in Science by Kelly Servick
- Lakens D et al. Justify Your Alpha: A Response to "Redefine Statistical Significance"
- McShane Bradley B., Gal, David et al. Abandon Statistical Significance
"The Value of a p-valueless Paper" by Jason T. Connor, Amer J Gastroenterology 2004, 99(9): 1638-1640.
Andrew Gelman (2013) Too good to be true Slate July 24. Subtitle: Statistics may say that women wear red when they’re fertile... but you can't always trust statistics.
Andrew Gelman and John Carlin Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors Psychological Science 2014, 9(6): 641-651.
Andrew Gelman and Eric Loken (2013) The garden of forking paths: Why multiple comparisons can be a problem, even when there is no "fishing expedition" or "p-hacking" and the research hypothesis was posited ahead of time.
Andrew Gelman and David Weakliem Of Beauty, Sex and Power American Scientist 97, 310-316.
"Why Most Published Research Findings Are False" by John P. A. Ioannidis, PLOS Medicine August 2005: https://doi.org/10.1371/journal.pmed.0020124
Kass RE et al. (2016) Ten Simple Rules for Effective Statistical Practice
Understanding the Role of P Values and Hypothesis Tests in Clinical Research by Daniel B. Mark, Kerry L. Lee and Frank E. Harrell Jr. (December 2016) JAMA Cardiology 1(9):1048-1054. doi:10.1001/jamacardio.2016.3312
Simmons JP et al. (2011) False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant Psychological Science 22(11): 1359-1366.
Do Not Over (P) Value Your Research Article by Laine E. Thomas, Michael J. Pencina (December 2016) editorial in JAMA Cardiology 1(9): 1055. doi:10.1001/jamacardio.2016.3827

From FiveThirtyEight.com

"Not Even Scientists Can Easily Explain p Values" article and video by Christie Aschwanden
"Statisticians Found One Thing They Can Agree On: It’s Time To Stop Misusing P-Values" discussion of the American Statistical Association's 2016 Statement on P values, by Christie Aschwanden.
Science isn't broken by Christie Aschwanden, with p-hacking graphic by Ritchie King
Finally, A Formula for Decoding Health News by Jeff Leek

Nature.com

Psychology journal bans P values on the banning of null hypothesis significance testing by Basic and Applied Psychology, by Chris Woolston, 2015-03-09.
Scientific method: Statistical errors column on p values by Regina Nuzzo, 2014-02-12. "the P value, which is neither as reliable nor as objective as most scientists assume"

NOVA and NPR

Rethinking Science's Magic Number from Tiffany Dill at NOVA Next. 2018-02-28.
NOVA Season 45 Episode 6 Prediction by the Numbers 2018-02-28.
NPR's On the Media with a skeptic’s guide to health news/diet fads. July 2015.

Andrew Gelman's blog "Statistical Modeling, Causal Inference, and Social Science"

"Instead of 'confidence interval,' let’s say 'uncertainty interval'" 2010-12-21
"Researcher Degrees of Freedom" 2012-11-01
"How can statisticians help psychologists do their research better?" 2013-05-17
"The Fallacy of Placing Confidence in Confidence Intervals" 2014-12-11
"The feather, the bathroom scale, and the kangaroo" 2015-04-21
"Statistics is like basketball or knitting" 2016-03-11.
"More on my paper with John Carlin on Type M and Type S errors" 2016-10-13.
"Marginally Significant Effects as Evidence for Hypotheses: Changing Attitudes Over Four Decades" 2016-10-15.
"How not to analyze noisy data: A case study" 2016-10-25.
"Abandon statistical significance" 2017-09-26
"My favorite definition of statistical significance" 2017-10-28.

From Jeff Leek, and Simply Statistics

A few things that would reduce stress around reproducibility/replicability in science by Jeff Leek at Simply Statistics, 2017-11-21.
Jeff Leek's books on LeanPub:
- How To Be a Modern Scientist
- The Elements of Data Analytic Style
Jeff Leek's tidypvals package in R, on GitHub
Sad p value bear meme discussion from Jeff Leek at Simply Statistics

On Reproducible Research

Karl Broman has all kinds of great stuff, including the "I'm really sorry you did all the work" slide in this talk
Karl Broman's Initial Steps Towards Reproducible Research page
Harrell FE Scott T (2012) Reproducible Research Tutorial from Vanderbilt U.
Donoho DL (2010) An invitation to reproducible computational research

Cartoons

XKCD cartoons include:
- ISO 8601 (Public Service Announcement)
- Significant (Green Jelly Beans)

Tweets

Jenny Bryan Why is this the hill that statisticians choose to die on?
Jenny Bryan Decision making under uncertainty struggles
J. P. de Ruiter Specific null, arbitrary alternative...
Michael Donohoe Comprehensive map of all countries in the world that use the MMDDYYYY format
Joran Elias A confidence interval is ...
Frank Harrell It's important to get this right
John Holbein Nothing screams GRADUATE STUDENT ...
Jeff Leek 2.5 million p values
Thomas Leeper An interval drawn such that ...
Michael Lopez analyzes some of those p values
Chelsea Pelleriti What's your 280 character definition of a confidence interval?
Randy Sweis p value of 0.06 trending...
Sean J. Taylor, Facebook Core Statistics Team

THOMASELOVE/rethink