2017-07-30 09:26:36

IRBs & Data Sharing

Acknowledgements

Vision

  • Make data from psychological research as widely available as possible
    • Increase reuse potential
    • Reduce bias
    • Make published analyses as transparent as possible
  • Avoid harming research participants

Roadmap

  • Ethical challenges in sharing data
  • Sharing de-identified data
  • Sharing identifiable data

Ethical challenges in sharing data

Belmont principles

  • Beneficence
    • Data sharing increases value (good)
    • Data sharing may pose risk of loss of privacy & confidentiality (bad)
  • Autonomy
    • Data sharing may pose risk of unintended use of data
    • Participants should participate in decisionmaking

  • Justice
    • Benefits (and costs) of research participation should be equitable

Meeting the challenges

  • Tension b/w protecting participants & advancing discovery
  • Tension b/w requirements/expectations/desires to share and practical, regulatory/legal, ethical constraints

What data are you collecting?

  • Personally identifying or sensitive data?
  • What risks does data sharing pose?
  • How should data be protected?

Who will (& should) have access?

  • Public
  • Community of authorized individuals (researchers)
  • Individuals selected by data owner or repository

What have participants been told, approved, understood?

  • What data collected, what will be shared
  • Who will have access
  • Where stored, how accessed
  • Purposes of use, types of questions

Are your data subject to statutory, regulatory, or contractual restrictions?

Sharing de-identified data

What is personally identifying information (PII)?

  • PII definitions vary by use case, context
  • Likelihood of identification depends on uniqueness in target and reference populations
  • Health Insurance Portability and Accountability Act (HIPAA) identifiers

HIPAA identifiers

  • Name
  • Address (all geographic subdivisions smaller than state, including street address, city, county, and zip code)
  • All elements (except years) of dates related to an individual (including birthdate, admission date, discharge date, date of death, and exact age if over 89)

  • Telephone
  • Fax numbers
  • Email address
  • Social Security Number

  • Medical record number
  • Health plan beneficiary number
  • Account number
  • Certificate or license number
  • Any vehicle or other device serial number
  • Web URL
  • Internet Protocol (IP) Address

  • Finger or voice print
  • Photographic image - not limited to images of the face
  • Any other characteristic that could uniquely identify the individual

Other potentially identifiable information

  • Structural MRI
    • Emerging standard: deface
  • Genetic profiles

Examples of possibly sensitive data

  • Health-related information
    • Medical history
    • Medical risk factors including genetic data

  • Information about other potentially stigmatizing characteristics (situation-dependent)
    • Religious/philosophical convictions
    • Sexual identity and preferences
    • Political affiliation, trade union membership
    • Ethnicity, nationality, citizenship status

Weighing benefits (of sharing) vs. risks

  • How useful are data?
  • How sensitive are data?
  • How likely is it that reidentification could be achieved, and by whom?

Risk scenarios

  • Reidentification by participants themselves
    • Can be harmful e.g. if dataset contains uncommunicated health risk information
  • Reidentification by insider
  • Reidentification by targeted search (nemesis scenario)
  • Reidentification by mass matching (dystopian AI scenario)

Ways to mitigate risk

  • Aggregate or censor sensitive variables
  • Aggregate or censor secondary identifying variables
  • Perturb or add noise to variables
  • Review data for disclosure risk

  • Stepped or restricted access
    • Data enclaves (e.g., Census data)
    • Virtual data enclaves

Example language for consent forms

Case study: OpenNeuro.org

Sharing identifiable data

Canadian Policy

  • Researchers must obtain consent for secondary use of identifiable data unless
    • identifiable information is essential to the research
    • use of identifiable information without consent is unlikely to adversely affect participants
    • researchers take appropriate measures to protect privacy of individuals and safeguard identifiable information

  • Researchers must obtain consent for secondary use of identifiable data unless
    • researchers comply with any known preferences previously expressed by individuals about any use of their information
    • it is impossible or impracticable to seek consent
    • researchers have obtained any other necessary permission for secondary use of information for research purposes.

Case studies in sharing identifiable data

Databrary.org

Databrary.org

  • Specializes in storing, sharing video
  • Video captures behavior more fully than other methods, but is identifiable
  • Policy framework for sharing identifiable data
    • Permission to share -> builds on informed consent
    • Restricted access for (institutionally) authorized researchers

Seeking permission to share

Lessons learned

  • Research consent ≠ permission to share
    • Seek permission to share after data collection.
  • "Cloud" storage vs. institutionally housed
  • Comfort with data sharing varies among IRBs
  • Laws differ among countries

Open Humans

Public sharing of identifiable data

Risks of public sharing

Specific risks for sharing these data types

Benefits of public sharing

Recommendations

Prepare for sharing

  • Get IRB/ethics board approval
  • Get participant approval (even if planning to anonymize)

Alert participants

  • Where data will be stored
    • e.g. in "cloud" servers (e.g., SurveyMonkey, Qualtrics, Databrary, OpenNeuro, OSF, etc.)
    • Be explicit, but not specific

  • Who will have access
    • Public/anyone
    • Researchers
  • And for how long
    • indefinitely
    • stopping sharing possible, unsharing not-so

  • Why:
    • Give motivation for recording sensitive variables (beneficence)
  • Consult data repository experts (e.g., ICPSR, Dataverse, Databrary)

  • Avoid making promises you cannot keep:
    • "no one except the researchers in the project will ever see the data"
  • Avoid data destruction clauses:
    • "Your data will be stored for X years then destroyed."
    • NOT REQUIRED by U.S. or Canadian law

  • Avoid describing overly specific use cases for data:
    • "Your data will be used to study the relationship between X and Y."

Share as openly as practicable

  • Consider approved, authorized, trusted data repositories for sensitive data
  • Share as much individual-level, item-specific data as practicable
    • Finest grain data == highest value for reuse, new discovery

Discussion

How anonymous is 'anonymous' data?

Name + DOB + ZIP uniquely identifies most Americans

Other issues

  • Self-reported vs. medical records
  • Sponsor requirements (or constraints) vs. open sharing
  • Do IRBs overstep regulatory boundaries when considering risks and benefits outside an approved study (Burnam, 2014)?
  • Policies for restricting access but promoting openness
  • Who owns data?

DataTags initiative

  • From Dataverse @ Harvard
  • Checklist/workflow for 'tagging' data based on risk

Resources

Stack

## R version 3.4.0 (2017-04-21)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.5
## 
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.4.0  backports_1.0.5 magrittr_1.5    rprojroot_1.2  
##  [5] htmltools_0.3.6 tools_3.4.0     yaml_2.1.14     Rcpp_0.12.10   
##  [9] stringi_1.1.5   rmarkdown_1.5   knitr_1.16.4    caTools_1.17.1 
## [13] stringr_1.2.0   digest_0.6.12   bitops_1.0-6    evaluate_0.10