An open science of human health & behavior




Agenda

  • Prelude
  • Some questions to ponder
  • The “ethos” of science
  • Issues, ideas, tools, & practices
  • An open science future…

https://www.youtube.com/embed/66oNv_DJuPc

Questions to ponder

What proportion of findings in the published scientific literature (in the fields you care about) are actually true?

  • 100%
  • 90%
  • 70%
  • 50%
  • 30%

How do we define what “actually true” means?

Is there a reproducibility crisis in science?

  • Yes, a significant crisis
  • Yes, a slight crisis
  • No crisis
  • Don’t know

Have you failed to reproduce an analysis from your lab or someone else’s?

Does this surprise you? Why or why not?

The ‘Ethos’ of Science

Robert Merton

  • universalism: scientific validity is independent of sociopolitical status/personal attributes of its participants
  • communalism: common ownership of scientific goods (intellectual property)
  • disinterestedness: scientific institutions benefit a common scientific enterprise, not specific individuals
  • organized skepticism: claims should be exposed to critical scrutiny before being accepted

Are these norms at-risk? How or when?

…psychologists tend to treat other peoples’ theories like toothbrushes; no self-respecting individual wants to use anyone else’s.

Mischel, 2009

The toothbrush culture undermines the building of a genuinely cumulative science, encouraging more parallel play and solo game playing, rather than building on each other’s directly relevant best work.

Mischel, 2009

Issues, ideas, tools, & practices

  • What is reproducibility?
  • Where/how to share data?
  • What’s a reproducible workflow?
  • Tools for reproducible workflows
  • What is version control?
  • What’s preregistration about?
  • What are these big replication studies about?

What is reproducibility?

Methods reproducibility

  • Enough details about materials & methods recorded (& reported)
  • Same results with same materials & methods

Goodman et al., 2016

Results reproducibility

  • Same results from independent study

Goodman et al., 2016

Inferential reproducibility

  • Same inferences from one or more studies or reanalyses
Goodman et al., 2016

Where/how to share data?

  • Lab website vs.
  • Supplemental information with journal article

Data repository

Gilmore et al. 2018

What’s a reproducible workflow?

  • Data in interoperable formats (.txt or .csv)
  • Scripted, automated == minimize human-dependent steps
  • Well-documented
  • Kind to your future (forgetful) self
  • Transparent to me & colleagues == transparent to others
# Import/gather data

# Clean data

# Visualize data

# Analyze data

# Report findings
# Import data
my_data <- read.csv("path/2/data_file.csv")

# Clean data
my_data$gender <- tolower(my_data$gender) # make lower case
...
# Import data
source("R/Import_data.R") # source() runs scripts, loads functions

# Clean data
source("R/Clean_data.R")

# Visualize data
source("R/Visualize_data.R")
...

Working examples

Tools for reproducible workflows

But my SPSS syntax file already does this

  • Great! How are you sharing these files?
  • (And how much would SPSS cost you if you had to buy it yourself?)

But I prefer {Python, Julia, Ruby, Matlab, …}

Reproducible research with R Markdown

  • Add-on package to R, developed by the RStudio team
  • Combine text, code, images, video, equations into one document
  • Render into PDF, MS Word, HTML (web page or site, slides, a blog, or even a book)

The mean is 0.2141312, the range is [-2.9430222, 3.4238146].

Ways to use R Markdown

Ways to use R Markdown

What is version control?

  • thesis_new.docx
  • thesis_new.new.docx
  • thesis_new.new.final.docx

vs.

  • thesis_2019-01-15v01.docx
  • thesis_2019-01-15v02.docx
  • thesis_2019-01-16v01.docx

Version control systems

  • Used in large-scale software engineering
  • svn, bitbucket, git
  • GitHub

How I use GitHub

  • Every project gets a repository
  • Work locally, commit (save & increment version), push to GitHub
  • Talks, classes, software, analyses, web sites

What are registered reports and pre-registration about?

https://cos.io/rr/

Why preregister?

  • Nosek: “Don’t fool yourself”
  • Separate confirmatory from exploratory analyses
  • Confirmatory (hypothesis-driven): p-hacking matters
  • Exploratory: p-values hard(er) to interpret

How/where

<img src=“img/as-predicted-1.jpg”

Skeptics and converts

What are these big replication studies about?

Studies are underpowered

Assuming a realistic range of prior probabilities for null hypotheses, false report probability is likely to exceed 50% for the whole literature.

Szucs & Ioannides, 2017

Many Labs

Klein et al. 2014

Reproducibility Project: Psychology (RPP)

…The mean effect size (r) of the replication effects…was half the magnitude of the mean effect size of the original effects…

Open Science Collaboration, 2015

…39% of effects were subjectively rated to have replicated the original result…

Open Science Collaboration, 2015

Camerer et al. 2018

Camerer et al. 2018

If it’s too good to be true, it probably isn’t

<https://80000hours.org/psychology-replication-quiz/

An open science future…

The advancement of detailed and diverse knowledge about the development of the world’s children is essential for improving the health and well-being of humanity…

SRCD Task Force on Scientific Integrity and Openness

We regard scientific integrity, transparency, and openness as essential for the conduct of research and its application to practice and policy…

SRCD Task Force on Scientific Integrity and Openness

…the principles of human subject research require an analysis of both risks and benefits…such an analysis suggests that researchers may have a positive duty to share data in order to maximize the contribution that individual participants have made.

Brakewood & Poldack, 2013

https://gilmore-lab.github.io

https://gilmore-lab.github.io/2019-01-15-open-science-psu-hhd/

Stack

This talk was produced on 2019-12-11 in RStudio version using R Markdown and the reveal.JS framework. The code and materials used to generate the slides may be found at https://github.com/gilmore-lab/2019-01-15-open-science-psu-hhd/. Information about the R Session that produced the code is as follows:

## R version 3.5.3 (2019-03-11)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.5.3   magrittr_1.5     rsconnect_0.8.13 htmltools_0.3.6 
##  [5] tools_3.5.3      revealjs_0.9     yaml_2.2.0       Rcpp_1.0.1      
##  [9] stringi_1.4.3    rmarkdown_1.13   highr_0.8        knitr_1.23      
## [13] stringr_1.4.0    xfun_0.8         digest_0.6.19    packrat_0.5.0   
## [17] evaluate_0.14