The Reproducibility Crisis in Psychology & Neuroscience

Rick O. Gilmore

2017-07-11 07:30:12





  • There’s something happening here; what it is ain’t exactly clear…
  • Reproducibility in Psychology
  • Reproducibility in Neuroscience
  • Toward a better future

Pop quiz

Is there a reproducibility crisis?

  • Yes, a significant crisis.
  • Yes, a slight crisis.
  • No, there is no crisis.
  • Don’t know.

Have you ever failed to reproduce an experiment?

  • Someone else’s
  • My own

Have you ever tried to publish a reproduction attempt?

  • Published
  • Failed to publish

What factors contribute to irreproducible research?

Solvable problems

  • Selective reporting
  • Pressure to publish
  • Low power
  • Replication in original labs
  • Mentoring/oversight
  • Methods, code, data unavailable
  • Poor design
  • Fraud
  • Insufficient peer review
  • Variability of reagents
  • Bad luck

Reproducibility in Psychology

Psychology is harder than physics

Behavior has multiple, nested dimensions

Adolph, K., Tamis-LeMonda, C. & Gilmore, R.O. (2016). PLAY Project: Webinar discussions on protocol and coding. Databrary. Retrieved February 17, 2017 from

Data are sensitive, hard(er) to share

  • Protect participant’s identities
  • Protect from harm/embarrassment

Sampling is biased

Even on Mechanical Turk

The sin of unreliability

The sin of data hoarding

  • Data sharing is not universal practice
  • Even after a publication has gone to press
  • Despite implicit agreement to do so if publishing in certain journals (e.g., American Psychological Association)

& hasn’t improved since 2006

The sin of corruptibility

  • Stapel was Dean of the School of Social and Behavioral Sciences at Tilburg University, teacher of Scientific Ethics course
  • Fraud investigation launched when 3 grad students noticed anomalies – duplicate entries in data tables
  • Stapel confessed, lost position, gave up Ph.D., wrote a book

Self-reported data fabrication, falsification, and alteration

Self-reports of questionable research practices

Mistakes, e.g., flexible “stopping” rules

Errors/omissions in data

The sin of bias…

*“This article reports 9 experiments, involving more than 1,000 participants, that test for retroactive influence by”time-reversing" well-established psychological effects so that the individual’s responses are obtained before the putatively causal stimulus events occur.“*

(Bem 2011)

“We argue that in order to convince a skeptical audience of a controversial claim, one needs to conduct strictly confirmatory studies and analyze the results with statistical tests that are conservative rather than liberal. We conclude that Bem’s p values do not indicate evidence in favor of precognition; instead, they indicate that experimental psychologists need to change the way they conduct their experiments and analyze their data.”

(E.-J. Wagenmakers et al. 2011)

Reproducibility “Crisis”

Results published in (Collaboration 2015)

  • 39/98 (39.7%) replication attempts were successful
  • 97% of original studies reported statistically significant results vs. 36% of replications

So, did the studies replicate?

  • Samples not equal
    • Sampling error differences predicts < 100% reproducibility
  • Protocols not identical
    • Only 69% of original PIs “endorsed” replication protocol. Replication rate 4x higher (59.7% vs. 15.4%) in studies with endorsed protocol.
  • (Collaboration 2015) “…seriously underestimated reproducibility of psychological science.”

Reproducibility in Neuroscience

Underpowered studies

Risks of false positives

Multiple (> 69K) computational pathways

Toward a better future

Tools for openness and transparency

Changing journal, funder practices

  • Data, materials, code citation
  • Data transparency
  • Analytic methods (code) transparency
  • Design and analysis transparency
  • Preregistration of studies
  • Preregistration of analysis plans
  • Replication

Large-scale replication efforts

Improving methodology training

Data publication

  • Specializes in storing, sharing video
  • Video captures behavior unlike other methods, but is identifiable
  • Policy framework for sharing identifiable data
    • Permission to share -> builds on informed consent
    • Restricted access for (institutionally) authorized researchers
  • Datavyu, free, open-source video coding tool

Toward a databservatory…

The Human Project

Open Humans

Social Data Explorer


Allen Brain Atlas

A vision of our open science future…

  • All data, materials, code shared
    • when paper goes to press or at end of grant period
  • Shared in repositories that encourage data linkage (w/ permission)
    • People, places, times, tasks, behaviors, …

  • Commonplace citations of data, materials, code, findings
  • Ecosystems for new discovery

What’s your vision?

Thank you & good luck this week


This talk was produced on 2017-07-11 in RStudio 1.0.143 using R Markdown and the reveal.JS framework. A BibTex format reference file can be found in bib/psu-repro.bib The code and materials used to generate the slides may be found at Information about the R Session that produced the code is as follows:

## R version 3.4.0 (2017-04-21)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.5
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## other attached packages:
## [1] dplyr_0.5.0   ggplot2_2.2.1
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.10     knitr_1.16.4     magrittr_1.5     munsell_0.4.3   
##  [5] colorspace_1.3-2 R6_2.2.0         stringr_1.2.0    plyr_1.8.4      
##  [9] tools_3.4.0      revealjs_0.9     grid_3.4.0       gtable_0.2.0    
## [13] DBI_0.6-1        htmltools_0.3.6  yaml_2.1.14      lazyeval_0.2.0  
## [17] rprojroot_1.2    digest_0.6.12    assertthat_0.2.0 tibble_1.3.0    
## [21] evaluate_0.10    rmarkdown_1.5    labeling_0.3     stringi_1.1.5   
## [25] compiler_3.4.0   scales_0.4.1     backports_1.0.5


Baker, Monya. 2016. “1,500 Scientists Lift the Lid on Reproducibility.” Nature News 533 (7604): 452. doi:10.1038/533452a.

Baker, Monya, and Elie Dolgin. 2017. “Cancer Reproducibility Project Releases First Results.” Nature 541 (7637): 269–70. doi:10.1038/541269a.

Bem, Daryl J. 2011. “Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect.” J. Pers. Soc. Psychol. 100 (3): 407–25. doi:10.1037/a0021524.

Benjamin, Daniel, David R Mandel, and Jonathan Kimmelman. 2017. “Can Cancer Researchers Accurately Judge Whether Preclinical Reports Will Reproduce?” PLoS Biol. 15 (6): e2002212. doi:10.1371/journal.pbio.2002212.

Button, Katherine S, John P A Ioannidis, Claire Mokrysz, Brian A Nosek, Jonathan Flint, Emma S J Robinson, and Marcus R Munafò. 2013. “Power Failure: Why Small Sample Size Undermines the Reliability of Neuroscience.” Nat. Rev. Neurosci. 14 (5): 365–76. doi:10.1038/nrn3475.

Collaboration, Open Science. 2015. “Estimating the Reproducibility of Psychological.” Science 349 (6251): aac4716. doi:10.1126/science.aac4716.

Fanelli, Daniele. 2009. “How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data.” PLOS ONE 4 (5): e5738. doi:10.1371/journal.pone.0005738.

Gilmore, Rick O, and Karen E Adolph. 2017. “Video Can Make Behavioural Research More Reproducible.” Nature Human Behavior 1 (june). doi:10.1038/s41562-017-0128.

Gilmore, Rick O, Michele T Diaz, Brad A Wyble, and Tal Yarkoni. 2017. “Progress Toward Openness, Transparency, and Reproducibility in Cognitive Neuroscience.” Ann. N. Y. Acad. Sci., 2~may. doi:10.1111/nyas.13325.

Grinvald, Amiram, and Rina Hildesheim. 2004. “VSDI: A New Era in Functional Imaging of Cortical Dynamics.” Nature Review Neuroscience 5 (11): 874–85. doi:10.1038/nrn1536.

Henrich, Joseph, Steven J. Heine, and Ara Norenzayan. 2010. “The Weirdest People in the World?” The Behavioral and Brain Sciences 33 (2-3): 61–83; discussion 83–135. doi:10.1017/S0140525X0999152X.

LaCour, Michael J., and Donald P. Green. 2014. “When Contact Changes Minds: An Experiment on Transmission of Support for Gay Equality.” Science 346 (6215): 1366–9. doi:10.1126/science.1256151.

Maxwell, Scott E. 2004. “The Persistence of Underpowered Studies in Psychological Research: Causes, Consequences, and Remedies.” Psychological Methods 9 (2): 147–63. doi:10.1037/1082-989X.9.2.147.

Munafò, Marcus R., Brian A. Nosek, Dorothy V. M. Bishop, Katherine S. Button, Christopher D. Chambers, Nathalie Percie du Sert, Uri Simonsohn, Eric-Jan Wagenmakers, Jennifer J. Ware, and John P. A. Ioannidis. 2017. “A Manifesto for Reproducible Science.” Nature Human Behaviour 1 (January): 0021. doi:10.1038/s41562-016-0021.

Nosek, B. A., G. Alter, G. C. Banks, D. Borsboom, S. D. Bowman, S. J. Breckler, S. Buck, et al. 2015. “Promoting an Open Research Culture.” Science 348 (6242): 1422–5. doi:10.1126/science.aab2374.

Poldrack, Russell A, Chris I Baker, Joke Durnez, Krzysztof J Gorgolewski, Paul M Matthews, Marcus R Munafò, Thomas E Nichols, Jean-Baptiste Poline, Edward Vul, and Tal Yarkoni. 2017. “Scanning the Horizon: Towards Transparent and Reproducible Neuroimaging Research.” Nat. Rev. Neurosci. advance online publication (5~jan). doi:10.1038/nrn.2016.167.

Simmons, Joseph P., Leif D. Nelson, and Uri Simonsohn. 2011. “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” Psychological Science 22 (11): 1359–66. doi:10.1177/0956797611417632.

Szucs, Denes, and John PA Ioannidis. 2016. “Empirical Assessment of Published Effect Sizes and Power in the Recent Cognitive Neuroscience and Psychology Literature.” BioRxiv, August, 071530. doi:10.1101/071530.

Vanpaemel, Wolf, Maarten Vermorgen, Leen Deriemaecker, and Gert Storms. 2015. “Are We Wasting a Good Crisis? The Availability of Psychological Research Data After the Storm.” Collabra: Psychology 1 (1). doi:10.1525/collabra.13.

Wagenmakers, Eric-Jan, Ruud Wetzels, Denny Borsboom, and Han L J van der Maas. 2011. “Why Psychologists Must Change the Way They Analyze Their Data: The Case of Psi: Comment on Bem (2011).” J. Pers. Soc. Psychol. 100 (3): 426–32. doi:10.1037/a0022790.

Wicherts, Jelte M., Denny Borsboom, Judith Kats, and Dylan Molenaar. 2006. “The Poor Availability of Psychological Research Data for Reanalysis.” American Psychologist 61 (7): 726–28. doi:10.1037/0003-066X.61.7.726.