2020-04-15 13:18:04

Preliminaries



Agenda

  • The reproducibility crisis in science
  • Databrary.org
  • Questions to discuss

The reproducibility crisis in science

What proportion of findings in the published scientific literature (in the fields you care about) are actually true?

  • 100%
  • 90%
  • 70%
  • 50%
  • 30%

How do we define what “actually true” means?

Is there a reproducibility crisis in science?

  • Yes, a significant crisis
  • Yes, a slight crisis
  • No crisis
  • Don’t know

Have you failed to reproduce an analysis from your lab or someone else’s?

Will emphasizing transparency and openness in science…

yield more robust and reliable findings that others can readily build upon

(SRCD, 2019)

Is open sharing of research data and materials…

essential for the conduct of research and its application to practice and policy

(SRCD, 2019)

Databrary.org

Data about people requires protection

  • Breaches of privacy
  • Breaches of confidentiality
  • How are data collected?
  • How are data stored and shared?

Video and audio data pose special risks

  • Faces & voices
  • Names, personal locations
  • Behaviors

Video data have unique research potential

How to protect against risk & realize potential?

  • World’s only data library specialized for storing and sharing video and audio
  • Hosted at New York University
  • Opened 2014
  • 563 institutions; 1665 researchers; 53,222 hours of video + other data; 523 shared projects

How Databrary protects personal data

Open sharing (but with restricted audiences)

  • Researchers require institutional authorization
  • Formal access agreement
  • Site-wide access, not dataset-specific
  • Data use and contribution

Virtues

  • Restricted data sharing has long track-record
  • Meaningful sharing permission; clarifies nature of risk
  • Empowers participants
  • Researchers & institutions determine what to share & when

  • Open, but not public, sharing
  • Researchers, Institutions need not reinvent wheels
  • More discoverable than personal websites or institutional repositories
  • More secure than public data and materials services or journal web pages

  • Consistent curation makes reuse easier
  • Works for data beyond video
  • Secure data interaction via API

databraryapi::get_db_stats()
##                  date investigators affiliates institutions
## 1 2020-04-15 13:18:04          1122        543          563
##   datasets_total datasets_shared n_files    hours       TB
## 1           1076             523  119570 53222.37 25.10662

https://github.com/PLAY-behaviorome/databraryapi

Databrary 2.0

  • Updated policy framework
  • Rewriting in Node.js, Hasura/GraphQL, Vue.js/Quasar

Discussion

Where do researchers in your field share your data and materials?

If sharing data and materials is not commonplace, why?

What barriers must be overcome to make it commonplace?

CCan’t sharing data in repositories makes reproducible workflows easier?

Who owns data? Who should?

Does de-identification offer sufficient protection to participants?

Shouldn’t most (all?) human data be shared via restricted means?

Resources

Software

This talk was produced on 2020-04-15 in RStudio using R Markdown. The code and materials used to generate the slides may be found at https://github.com/gilmore-lab/2020-04-15-data-studies-group. Information about the R Session that produced the code is as follows:

## R version 3.6.2 (2019-12-12)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets 
## [6] methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.4         digest_0.6.25     
##  [3] R6_2.4.1           jsonlite_1.6.1    
##  [5] magrittr_1.5       evaluate_0.14     
##  [7] highr_0.8          httr_1.4.1        
##  [9] rlang_0.4.5        stringi_1.4.6     
## [11] curl_4.3           databraryapi_0.1.9
## [13] rmarkdown_2.1      tools_3.6.2       
## [15] stringr_1.4.0      xfun_0.12         
## [17] yaml_2.2.1         compiler_3.6.2    
## [19] htmltools_0.4.0    knitr_1.28

References

Adolph, K.E., Gilmore, R.O., & Kennedy, J.L. (2017). Video as data and documentation will improve psychological science. https://www.apa.org/science/about/psa/2017/10/video-data. Retrieved from https://www.apa.org/science/about/psa/2017/10/video-data

Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature News, 533(7604), 452. https://doi.org/10.1038/533452a

Gilmore, R. O., & Adolph, K. E. (2017). Video can make behavioural science more reproducible. Nature Human Behavior, 1. https://doi.org/10.1038/s41562-017-0128

SRCD. (2019). Policy on scientific integrity, transparency, and openness | society for Research in Child Development SRCD. https://www.srcd.org/policy-scientific-integrity-transparency-and-openness. Retrieved from https://www.srcd.org/policy-scientific-integrity-transparency-and-openness