2017-07-30 09:26:36

IRBs & Data Sharing



  • Make data from psychological research as widely available as possible
    • Increase reuse potential
    • Reduce bias
    • Make published analyses as transparent as possible
  • Avoid harming research participants


  • Ethical challenges in sharing data
  • Sharing de-identified data
  • Sharing identifiable data

Ethical challenges in sharing data

Belmont principles

  • Beneficence
    • Data sharing increases value (good)
    • Data sharing may pose risk of loss of privacy & confidentiality (bad)
  • Autonomy
    • Data sharing may pose risk of unintended use of data
    • Participants should participate in decisionmaking

  • Justice
    • Benefits (and costs) of research participation should be equitable

Meeting the challenges

  • Tension b/w protecting participants & advancing discovery
  • Tension b/w requirements/expectations/desires to share and practical, regulatory/legal, ethical constraints

What data are you collecting?

  • Personally identifying or sensitive data?
  • What risks does data sharing pose?
  • How should data be protected?

Who will (& should) have access?

  • Public
  • Community of authorized individuals (researchers)
  • Individuals selected by data owner or repository

What have participants been told, approved, understood?

  • What data collected, what will be shared
  • Who will have access
  • Where stored, how accessed
  • Purposes of use, types of questions

Are your data subject to statutory, regulatory, or contractual restrictions?

Sharing de-identified data

What is personally identifying information (PII)?

  • PII definitions vary by use case, context
  • Likelihood of identification depends on uniqueness in target and reference populations
  • Health Insurance Portability and Accountability Act (HIPAA) identifiers

HIPAA identifiers

  • Name
  • Address (all geographic subdivisions smaller than state, including street address, city, county, and zip code)
  • All elements (except years) of dates related to an individual (including birthdate, admission date, discharge date, date of death, and exact age if over 89)

  • Telephone
  • Fax numbers
  • Email address
  • Social Security Number

  • Medical record number
  • Health plan beneficiary number
  • Account number
  • Certificate or license number
  • Any vehicle or other device serial number
  • Web URL
  • Internet Protocol (IP) Address

  • Finger or voice print
  • Photographic image - not limited to images of the face
  • Any other characteristic that could uniquely identify the individual

Other potentially identifiable information

  • Structural MRI
    • Emerging standard: deface
  • Genetic profiles

Examples of possibly sensitive data

  • Health-related information
    • Medical history
    • Medical risk factors including genetic data

  • Information about other potentially stigmatizing characteristics (situation-dependent)
    • Religious/philosophical convictions
    • Sexual identity and preferences
    • Political affiliation, trade union membership
    • Ethnicity, nationality, citizenship status

Weighing benefits (of sharing) vs. risks

  • How useful are data?
  • How sensitive are data?
  • How likely is it that reidentification could be achieved, and by whom?

Risk scenarios

  • Reidentification by participants themselves
    • Can be harmful e.g. if dataset contains uncommunicated health risk information
  • Reidentification by insider
  • Reidentification by targeted search (nemesis scenario)
  • Reidentification by mass matching (dystopian AI scenario)

Ways to mitigate risk

  • Aggregate or censor sensitive variables
  • Aggregate or censor secondary identifying variables
  • Perturb or add noise to variables
  • Review data for disclosure risk

  • Stepped or restricted access
    • Data enclaves (e.g., Census data)
    • Virtual data enclaves

Example language for consent forms

Case study: OpenNeuro.org

Sharing identifiable data

Canadian Policy

  • Researchers must obtain consent for secondary use of identifiable data unless
    • identifiable information is essential to the research
    • use of identifiable information without consent is unlikely to adversely affect participants
    • researchers take appropriate measures to protect privacy of individuals and safeguard identifiable information

  • Researchers must obtain consent for secondary use of identifiable data unless
    • researchers comply with any known preferences previously expressed by individuals about any use of their information
    • it is impossible or impracticable to seek consent
    • researchers have obtained any other necessary permission for secondary use of information for research purposes.

Case studies in sharing identifiable data



  • Specializes in storing, sharing video
  • Video captures behavior more fully than other methods, but is identifiable
  • Policy framework for sharing identifiable data
    • Permission to share -> builds on informed consent
    • Restricted access for (institutionally) authorized researchers