2020-09-09 12:44:23

Preliminaries

Support




Agenda

  • An open science manifesto
  • Databrary.org
  • Let’s strengthen developmental neuroscience

An open science manifesto

The first principle is that you must not fool yourself, and you are the easiest one to fool.

– Richard P. Feynmann

Open science accelerates discovery

  • Shows your work
  • Data + analysis code + tasks
  • What’s the effect size for…

Closed science slows discovery

  • Wastes energy, time, money
  • Hasn’t somebody tried X before?
  • How big is the ‘file drawer effect’?

Open science strengthens inference

  • Improves reproducibility
  • Reveals errors faster
  • Permits verification, re-analysis; strengthens meta-analysis
  • Boosts sample sizes & increases power

Closed science weakens inference

But will emphasizing transparency and openness in science…

…yield more robust and reliable findings that others can readily build upon.

(SRCD, 2019)

Is open sharing of research data and materials…

essential for the conduct of research and its application to practice and policy.

(SRCD, 2019)

The advancement of detailed and diverse knowledge about the development of the world’s children is essential for improving the health and well-being of humanity.

(SRCD, 2019)

Yes, but…

  • Sharing difficult, time-consuming
  • Openness not yet rewarded, highly valued
  • Collecting new data better (for me) than cleaning-up finished studies to share them
  • I’ll share with X but not with Y

  • I’ll change when..
    • the field does/I get tenure/I’m finished with my data…
  • I can’t share because…
    • I don’t have permission from IRB/participants/collaborators
  • I own my data…
    • don’t I?

Databrary.org

  • World’s only data library specialized for storing and sharing video and audio + related data from research on humans
  • Hosted at New York University
  • Opened 2014
  • 588 institutions; 1746 researchers; 58,893 hours of video + other data; 577 shared projects

Video and audio data pose special risks

  • Faces & voices
  • Names, personal locations
  • Behaviors

Video data have unique research potential

How Databrary protects personal data

Share openly (but with restricted audiences)

  • Researchers require institutional authorization
  • Formal access agreement
  • Site-wide access, not dataset-specific
  • Generic uses
  • Fosters data re-use and contribution

Virtues

  • Restricted data sharing has long track-record (e.g., ICPSR)
  • Consistent sharing permission clarifies nature of risk
  • Empowers participants
  • Researchers & institutions determine what to share & when

  • Open, but not public, sharing
  • More secure than public data and materials services or journal web pages
  • Researchers, Institutions need not reinvent wheels
  • More discoverable than personal websites or institutional repositories
  • Allows smaller-scale programs of research to contribute

  • Consistent curation (and sharing permissions) make reuse easier
  • Works for data beyond video
  • Secure data interaction via API

databraryapi::get_db_stats()
##                  date investigators affiliates
## 1 2020-09-09 12:44:24          1191        555
##   institutions datasets_total datasets_shared n_files
## 1          588           1138             577  125663
##      hours       TB
## 1 58893.01 26.19616

(R. Gilmore, 2020)

Databrary 2.0

  • Updated policy framework
  • Rewriting in Node.js, Hasura/GraphQL, Vue.js/Quasar

Let’s open (& strengthen) developmental neuroscience

Embrace secondary re-use

  • ABCD
  • HBCD
  • NDA
  • Make productive use of large public investments

Use video

  • to document and share procedures
  • to capture and describe a broader range of participants’ behaviors

Share our own data and materials

Write and share reproducible analysis & display code

Change the scientific culture…

We can do it!

Make our science open!

rog1@psu.edu
gilmore-lab.github.io

Resources

Software

This talk was produced on 2020-09-09 in RStudio using R Markdown. The code and materials used to generate the slides may be found at https://github.com/gilmore-lab/2020-09-02-flux. Information about the R Session that produced the code is as follows:

## R version 3.6.2 (2019-12-12)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets 
## [6] methods   base     
## 
## other attached packages:
## [1] qrcode_0.1.1
## 
## loaded via a namespace (and not attached):
##  [1] png_0.1-7          assertthat_0.2.1  
##  [3] packrat_0.5.0      digest_0.6.25     
##  [5] R.methodsS3_1.8.0  R6_2.4.1          
##  [7] jsonlite_1.7.0     magrittr_1.5      
##  [9] evaluate_0.14      highr_0.8         
## [11] httr_1.4.2         rlang_0.4.7       
## [13] stringi_1.4.6      curl_4.3          
## [15] R.oo_1.23.0        R.utils_2.9.2     
## [17] keyring_1.1.0      databraryapi_0.2.3
## [19] rmarkdown_2.3      tools_3.6.2       
## [21] stringr_1.4.0      xfun_0.16         
## [23] yaml_2.2.1         compiler_3.6.2    
## [25] htmltools_0.5.0    knitr_1.29
## Logout Successful.
## [1] TRUE

References

Adolph, K.E., Gilmore, R.O., & Kennedy, J.L. (2017). Video as data and documentation will improve psychological science. https://www.apa.org/science/about/psa/2017/10/video-data. Retrieved from https://www.apa.org/science/about/psa/2017/10/video-data

Allaire, J., Xie, Y., McPherson, J., Luraschi, J., Ushey, K., Atkins, A., … Iannone, R. (2020). Rmarkdown: Dynamic documents for r. Retrieved from https://github.com/rstudio/rmarkdown

Gilmore, R. (2020). Databraryapi: Control the Databrary API. Retrieved from http://github.com/PLAY-behaviorome/databraryapi

Gilmore, R. O., & Adolph, K. E. (2017). Video can make behavioural science more reproducible. Nature Human Behavior, 1. https://doi.org/10.1038/s41562-017-0128

Gilmore, R. O., Cole, P. M., Verma, S., Aken, M. A. G., & Worthman, C. M. (2020). Advancing scientific integrity, transparency, and openness in child development research: Challenges and possible solutions. Child Development Perspectives, 14(1), 9–14. https://doi.org/10.1111/cdep.12360

Mota-Mena, N., & Scherf, K. S. (2016). Pubertal development shapes perception of complex facial expressions. Databrary. https://doi.org/10.17910/B7.272

R Core Team. (2019). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/

Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638–641. https://doi.org/10.1037/0033-2909.86.3.638

SRCD. (2019). Policy on scientific integrity, transparency, and openness | society for Research in Child Development SRCD. https://www.srcd.org/policy-scientific-integrity-transparency-and-openness. Retrieved from https://www.srcd.org/policy-scientific-integrity-transparency-and-openness

Szucs, D., & Ioannidis, J. P. A. (2017). Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLoS Biology, 15(3), e2000797. https://doi.org/10.1371/journal.pbio.2000797

Xie, Y., Allaire, J. J., & Grolemund, G. (2018). R markdown: The definitive guide. Boca Raton, Florida: Chapman; Hall/CRC. Retrieved from https://bookdown.org/yihui/rmarkdown