2017-01-31 15:58:00

An -ome of our own: Toward a more reproducible, robust, and insightful science of human behavior

Rick O. Gilmore

Support: NSF BCS-1147440, NSF BCS-1238599, NICHD U01-HD-076595

  • Associate Professor of Psychology
  • Founding Director of Human Imaging at Penn State's SLEIC
  • Co-founder and Co-Director of the Databrary.org digital library
  • A.B., Cognitive Science, Brown; M.S. & Ph.D., Cognitive Neuroscience, Carnegie Mellon University
  • Folk music, theatre, poetry, cycling, hiking, paddling, amateur radio (K3ROG)



Oh give me an -ome…


  • We've got problems…right here in River City
  • Bites at the apple
  • Thinking big, REALLY BIG, about solutions

Problems facing social and behavioral science


Marc Hauser

Diederik Stapel

Reproducibility Project: Psychology

  • Attempt to replicate 100 experimental and correlational studies published in three psychology journals in 2008 using high-powered designs and original materials when available.
  • Materials, data, protocols, analysis code shared via the Open Science Framework (OSF) from the Center for Open Science (COS)

Results published in (Collaboration 2015)

  • 39/98 (39.7%) replication attempts were successful
  • 97% of original studies reported statistically significant results vs. 36% of replications

So, did the studies replicate?

  • Protocols not identical
    • Only 69% of original PIs "endorsed" replication protocol. Replication rate 4x higher (59.7% vs. 15.4%) in studies with endorsed protocol.
    • What is CI of expected effect sizes given sample/methodological variability? Many Labs project
  • (Collaboration 2015) "…seriously underestimated reproducibility of psychological science."

Not just social & behavioral science

Questions about reproducibility

  • Methods reproducibility refers to "…the ability to implement, as exactly as possible, the experimental and computational procedures, with the same data and tools, to obtain the same results."
  • Results reproducibility "(previously described as replicability) refers to obtaining the same results from the conduct of an independent study whose procedures are as closely matched to the original experiment as possible."
  • Inferential reproducibility "…refers to the drawing of qualitatively similar conclusions from either an independent replication of a study or a reanalysis of the original study"

(Goodman, Fanelli, and Ioannidis 2016)

Sampling challenges

(Szucs and Ioannidis 2016)

"We have empirically assessed the distribution of published effect sizes and estimated power by extracting more than 100,000 statistical records from about 10,000 cognitive neuroscience and psychology papers published during the past 5 years. The reported median effect size was d=0.93 (inter-quartile range: 0.64-1.46) for nominally statistically significant results and d=0.24 (0.11-0.42) for non-significant results. Median power to detect small, medium and large effects was 0.12, 0.44 and 0.73, reflecting no improvement through the past half-century. Power was lowest for cognitive neuroscience journals. 14% of papers reported some statistically significant results, although the respective F statistic and degrees of freedom proved that these were non-significant; p value errors positively correlated with journal impact factors. False report probability is likely to exceed 50% for the whole literature. In light of our findings the recently reported low replication success in psychology is realistic and worse performance may be expected for cognitive neuroscience."

Rich man…

  • New data sources
    • Wearables
    • Electronic education records
    • Electronic medical records
    • Social media

Poor man

What's at stake?

  • A great deal
  • Validity of our evidence
  • Credibility of our arguments

Science denialism/marginalization

21st c. problems

  • Require scientific solutions
    • technical
    • social/political/economic
    • behavioral
    • in public AND private spheres
  • Fraud/reproducibility undermine public trust
  • Address challenges before opponents do
  • Maintain a robust, non-commercial science of human behavior

Small-scale solutions

Transparency and opennness promotion (TOP) guidelines

  • Citation
  • Data transparency
  • Analytic methods (code) transparency
  • Design and analysis transparency
  • Preregistration of studies
  • Preregistration of analysis plans
  • Replication

Who's signed on and who hasn't?

  • List of TOP signatories
  • Declines
    • (Lash 2015)
    • Implementation would run counter to efforts to "…maintain an editorial policy that encourages creativity and novelty, resists regimentation of research practices to the extent practicable, and invites challenges to current scientific habits and conventions through innovation in epidemiologic theory and practice."

On the other hand…

(Munafò et al. 2017) manifesto

Better methodology training

Data (materials and code) should be

Data (materials and code) should be


  • Specializes in storing, sharing video
  • Video captures behavior unlike other methods, but is identifiable
  • Policy framework for sharing identifiable data
    • Permission to share -> builds on informed consent
    • Restricted access for (institutionally) authorized researchers

Video essential for reproducibility

Videos of empirical procedures can and should be viewed as the gold standard of documentation across the behavioral sciences. Indeed, were the use of video for this purpose more widespread, many disagreements about whether empirical replications truly reproduced the original experimental conditions would be moot (Collaboration, 2015; Gilbert et al., 2016). The power of video to document procedures should also be an attractive solution for scientists in fields that do not commonly collect or analyze video.

Gilmore & Adolph (in press)

Changing practices of researchers, institutions, journals, and funders

Your thoughts?

REALLY BIG solutions

Thinking big…

Imagine a 'Databservatory' for human behavior


Shonkoff, J. P., & Phillips, D. A. (Eds.). (2000). From neurons to neighborhoods: The science of early childhood development. National Academies Press.

What would this micro/macro/telescope look like?

  • Recruiting – larger, more diverse samples
  • Data collection – more data types, allow linkage across levels
  • Data curation/management – easy/automatic, standardized formats
  • Data sharing – PI controls when, permission levels

  • Data mining, visualization, linking
  • Search, filter by participant characteristics, tasks/measures, geo/temporal factors
  • Analysis in the "cloud"
  • Automatic versioning, history

The front end

  • App/web service
  • Linking researchers with participants (or parents)
  • Participants own/control their data, determine level of sharing (like datawallet.io)
  • Lab, computer/smart-phone based, survey tasks
  • Data visualizations, dashboard
  • 1,000+ psych pool/semester, 500K PSU alumni, 1M friends


The middle

Analytic/visualization/data publication engine


  • Threaded comments from submission to pre-publication review to post-publication commentary
  • Analysis, data, visualizations on the fly (e.g., Shiny)
  • Impact/quality factors computed based on multiple criteria

Problems to solve

Is data


  • Depends on who's asking
  • Let's keep academic social and behavioral science
    • relevant
    • in the public domain
    • serving the general public good

Problems to solved

Data harmonization and preparation

  • Astro, geo, bio-sciences have common frames of reference, linking variables
  • Essential data linkages in social/behavioral science
    • People
    • Locations
      • Lat/Lon or Census Block Group/Tract
    • Dates and times
    • Tasks/behaviors

Respecting privacy

  • Asking participants EXPLICIT permission to share with researchers for research purposes

When asked, most participants say yes

Respecting privacy

  • Give participants meaningful ownership over their data
  • Give participants value for contributing

Clarifying the value of participating in research

  • 'Free' service (email, calendar, search, communications platform) vs.

  • Contributions to public good
  • Aid discovery
  • Feed curiosity
  • Help institution, community, society

Summing up

  • "Oh, give me a 'ome, where social scientists roam…"
  • Recruiting + data collection + harmonization + mining + sharing + open publication
  • Shall we build it?
  • Here and now?
  • After all…


This talk was produced in RStudio version 1.0.136 on 2017-01-31. The code used to generate the slides can be found at http://github.com/gilmore-lab/soda-2017-01-31/. Information about the R Session that produced the code is as follows:

## R version 3.3.2 (2016-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X El Capitan 10.11.6
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## loaded via a namespace (and not attached):
##  [1] backports_1.0.4 magrittr_1.5    rprojroot_1.1   tools_3.3.2    
##  [5] htmltools_0.3.5 yaml_2.1.14     Rcpp_0.12.8     stringi_1.1.2  
##  [9] rmarkdown_1.3   knitr_1.15.1    stringr_1.1.0   digest_0.6.11  
## [13] evaluate_0.10


Baker, Monya. 2016. “1,500 Scientists Lift the Lid on Reproducibility.” Nature News 533 (7604): 452. doi:10.1038/533452a.

Begley, C. Glenn, and Lee M. Ellis. 2012. “Drug Development: Raise Standards for Preclinical Cancer Research.” Nature 483 (7391): 531–33. doi:10.1038/483531a.

Collaboration, Open Science. 2015. “Estimating the Reproducibility of Psychological.” Science 349 (6251): aac4716. doi:10.1126/science.aac4716.

Gilbert, Daniel T., Gary King, Stephen Pettigrew, and Timothy D. Wilson. 2016. “Comment on ‘Estimating the Reproducibility of Psychological Science’.” Science 351 (6277): 1037–7. doi:10.1126/science.aad7243.

Goodman, Steven N., Daniele Fanelli, and John P. A. Ioannidis. 2016. “What Does Research Reproducibility Mean?” Science Translational Medicine 8 (341): 341ps12–341ps12. doi:10.1126/scitranslmed.aaf5027.

Henrich, Joseph, Steven J. Heine, and Ara Norenzayan. 2010. “The Weirdest People in the World?” The Behavioral and Brain Sciences 33 (2-3): 61–83; discussion 83–135. doi:10.1017/S0140525X0999152X.

Lash, Timothy L. 2015. “Declining the Transparency and Openness Promotion Guidelines.” Epidemiology 26 (6). LWW: 779–80. http://journals.lww.com/epidem/Fulltext/2015/11000/Declining_the_Transparency_and_Openness_Promotion.1.aspx.

Maxwell, Scott E. 2004. “The Persistence of Underpowered Studies in Psychological Research: Causes, Consequences, and Remedies.” Psychological Methods 9 (2): 147–63. doi:10.1037/1082-989X.9.2.147.

Munafò, Marcus R., Brian A. Nosek, Dorothy V. M. Bishop, Katherine S. Button, Christopher D. Chambers, Nathalie Percie du Sert, Uri Simonsohn, Eric-Jan Wagenmakers, Jennifer J. Ware, and John P. A. Ioannidis. 2017. “A Manifesto for Reproducible Science.” Nature Human Behaviour 1 (January): 0021. doi:10.1038/s41562-016-0021.

Nosek, B. A., G. Alter, G. C. Banks, D. Borsboom, S. D. Bowman, S. J. Breckler, S. Buck, et al. 2015. “Promoting an Open Research Culture.” Science 348 (6242): 1422–5. doi:10.1126/science.aab2374.

Prinz, Florian, Thomas Schlange, and Khusru Asadullah. 2011. “Believe It or Not: How Much Can We Rely on Published Data on Potential Drug Targets?” Nature Reviews Drug Discovery 10 (9): 712–12. doi:10.1038/nrd3439-c1.

Szucs, Denes, and John PA Ioannidis. 2016. “Empirical Assessment of Published Effect Sizes and Power in the Recent Cognitive Neuroscience and Psychology Literature.” BioRxiv, August, 071530. doi:10.1101/071530.