Too Good to be False: Non-Significant Results Revisited

Abstract

Significant research results in psychology have sometimes been considered “too good to be true” and possibly false positive in recent years, but we investigate whether nonsignificant results are sometimes just “too good to be false” and false negative. To this end 54,595 nonsignificant test results across eight flagship psychological journals from 1985-2013 were investigated for false negatives. We propose two ways of testing for possible false negatives, one across papers and one within a paper. All inspected journals showed evidence for false negatives, albeit to differing degrees, with 66.7% of papers reporting nonsignificant results being possibly false negative. Across the entire set of results, the false negative effect was estimated at r ≈ .2 each year from 1985-2013. The false negative rate was estimated at 37-45%, which was also stable from 1985-2013, and the proportion of nonsignificant results reported in the psychological literature increased from 1985-2013. Sample sizes have remained similar throughout 1985-2013. This in combination with the results indicate false negatives have been far from resolved, and concern for false negatives in psychological science is warranted.

Keywords: nonsignificant, underpowered, effect size, fisher method

noutland/2014tgtbf

Too Good to be False: Non-Significant Results Revisited

Abstract