2017-02-21 12:02:28

"A Databservatory for Human Behavior"

Rick O. Gilmore

Support: NSF BCS-1147440, NSF BCS-1238599, NICHD U01-HD-076595


Psychology is harder than physics

Adolph, K., Tamis-LeMonda, C. & Gilmore, R.O. (2016). PLAY Project: Webinar discussions on protocol and coding. Databrary. Retrieved February 17, 2017 from https://nyu.databrary.org/volume/232


  • We've got problems…
  • Small-scale solutions…
  • Bigger, I mean HYUUGE, solutions

Problems facing behavioral science


Michael LaCour

Diederik Stapel

Self-reported data fabrication, falsification, and alteration

Self-reports of questionable research practices


Marc Hauser

Mistakes, e.g., flexible "stopping" rules

Errors/omissions in data

Reproducibility "Crisis"

Results published in (Collaboration 2015)

  • 39/98 (39.7%) replication attempts were successful
  • 97% of original studies reported statistically significant results vs. 36% of replications

So, did the studies replicate? (Gilbert et al. 2016)

  • Samples not equal
    • Sampling error differences predicts < 100% reproducibility

  • Protocols not identical
    • Only 69% of original PIs "endorsed" replication protocol. Replication rate 4x higher (59.7% vs. 15.4%) in studies with endorsed protocol.
    • What is CI of expected effect sizes given sample/methodological variability? Many Labs project

Not just social & behavioral science

Is there a reproducibility crisis?

  • Yes, a significant crisis.
  • Yes, a slight crisis.
  • No, there is no crisis.
  • Don't know.

Is there a reproducibility crisis?

Lack of clarity/agreement about reproducibility

  • Methods reproducibility refers to "…the ability to implement, as exactly as possible, the experimental and computational procedures, with the same data and tools, to obtain the same results."
  • Results reproducibility "(previously described as replicability) refers to obtaining the same results from the conduct of an independent study whose procedures are as closely matched to the original experiment as possible."
  • Inferential reproducibility "…refers to the drawing of qualitatively similar conclusions from either an independent replication of a study or a reanalysis of the original study"

Sampling challenges

Studies are underpowered

(Szucs and Ioannidis 2016)

"We have empirically assessed the distribution of published effect sizes and estimated power by extracting more than 100,000 statistical records from about 10,000 cognitive neuroscience and psychology papers published during the past 5 years. The reported median effect size was d=0.93 (inter-quartile range: 0.64-1.46) for nominally statistically significant results and d=0.24 (0.11-0.42) for non-significant results. Median power to detect small, medium and large effects was 0.12, 0.44 and 0.73, reflecting no improvement through the past half-century. Power was lowest for cognitive neuroscience journals. 14% of papers reported some statistically significant results, although the respective F statistic and degrees of freedom proved that these were non-significant; p value errors positively correlated with journal impact factors. False report probability is likely to exceed 50% for the whole literature. In light of our findings the recently reported low replication success in psychology is realistic and worse performance may be expected for cognitive neuroscience."

Rich man…

  • New data sources
    • Wearables
    • Electronic education records
    • Electronic medical records
    • Social media

Poor man

Limited data, materials sharing

Wicherts, J. M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61(7), 726–728. https://doi.org/10.1037/0003-066X.61.7.726

Little improvement over time…

What's at stake?

  • A great deal
  • Validity of our evidence
  • Credibility of our arguments

In our defense…

  • Science is hard
  • Life is short
  • Funding is competitive
  • Incentive structure undermines transparency, openness, sharing

