"A manifesto for reproducible science"

Discussion of

Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Sert, N. P. du, … Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1, 0021. https://doi.org/10.1038/s41562-016-0021.

Weaknesses in the scientific method

Steps in scientific method

  • Generate and specify hypothesis
  • Design study
  • Conduct study and collect data
  • Analyze data and test hypothesis
  • Interpret results
  • Publish and/or conduct next study

Failure to control for bias

Low statistical power

  • Statistical Power: probability that study will detect an effect, when one actually exists.
    • How big is effect?
    • How big and how variable is sample?

Is psychological science underpowered?

  • Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365–376. https://doi.org/10.1038/nrn3475
  • Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLOS Medicine, 2(8), e124. https://doi.org/10.1371/journal.pmed.0020124

  • Szucs, D., & Ioannidis, J. P. (2016). Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. bioRxiv, 071530. https://doi.org/10.1101/071530
  • "We have empirically assessed the distribution of published effect sizes and estimated power by extracting more than 100,000 statistical records from about 10,000 cognitive neuroscience and psychology papers published during the past 5 years."

Szucs & Ioannides 2016

  • "Median power to detect small, medium and large effects was 0.12, 0.44 and 0.73."
  • "False report probability is likely to exceed 50% for the whole literature."

Poor quality control

  • Goodman, S. N., Fanelli, D., & Ioannidis, J. P. A. (2016). What does research reproducibility mean? Science Translational Medicine, 8(341), 341ps12–341ps12. https://doi.org/10.1126/scitranslmed.aaf5027
  • Methods reproducibility
    • "…the ability to implement, as exactly as possible, the experimental and computational procedures, with the same data and tools, to obtain the same results."


HARKing: hypothesizing after the results are known

  • Kerr, N. L. (1998). HARKing: Hypothesizing After the Results are Known. Personality and Social Psychology Review, 2(3), 196–217. https://doi.org/10.1207/s15327957pspr0203_4
  • Find an effect in data analysis
  • Present effect as if it had been hypothesized

Publication bias

  • Results vs. null findings
  • Novel results vs. replications
  • Counter-intuitive findings
  • File drawer effect
    • How many unpublished failures to replicate sit in file drawers?

Overcoming these weaknesses

Performing research

  • Protecting against cognitive biases
  • Improving methodological training
  • Implementing independent methological support
  • Encouraging collaboration and team science

Reporting on research

  • Franco, A., Malhotra, N., & Simonovits, G. (2016). Underreporting in Psychology Experiments: Evidence From a Study Registry. Social Psychological and Personality Science, 7(1), 8–12. https://doi.org/10.1177/1948550615598377
  • "We find that about 40% of studies fail to fully report all experimental conditions and about 70% of studies do not report all outcome variables included in the questionnaire. Reported effect sizes are about twice as large as unreported effect sizes and are about 3 times more likely to be statistically significant."

Verifying research

  • Promoting transparency and open science

Evaluating research

  • Diversifying peer review

Changing Incentives

Higginson, A. D., & Munafò, M. R. (2016). Current Incentives for Scientists Lead to Underpowered Studies with Erroneous Conclusions. PLOS Biology, 14(11), e2000995. https://doi.org/10.1371/journal.pbio.2000995

  • Badge system

Status report/recommendations by stakeholder group

