Sharing non(personal) data: the case of neuro-imaging
Pernet, Cyril R.
MetadataShow full item record
Sharing data is beneficial in many ways: it accelerates progress in our fundamental understanding of the topic addressed by the data, it improves publication and data quality, it reduces the cost of research and increases the return on current research investments, it fosters research and advances in practices, and it is a requirement for reproducible science (Poline et al., 2012). However, in many cases data collected for research purposes fall under the The Data Protection Act 1998, and appropriate measures must be taken before sharing the data. Personal data are things such as an individual's personnel file, their medical records or home phone number. A further tier of personal data is defined in the DPA: sensitive personal data. This is personal data relating to more typically sensitive areas of an individual's life such as religion, race and political beliefs and any data relating to "physical or mental health or condition". Issues with personal data In studies involving human subjects, participants are asked to sign information processing consent forms (Laurie et al. 2013). These forms used to cover the processing of the data for the particular study only, but nowadays participants are often asked to agree to the further use and sharing of the data (usually in a properly de-identified form) beyond the original purposes of the study. This has raised some ethical discussions on the validity of these broad-consent formulae (both for data and samples) in particular with the emergence of BioBanks (Hansson et al. 2006, Sheehan et al. 2011, Steinsbekk et al. 2013). Generally, subjects also have the right to withdraw their data from a study at any time. However, once the de-identified data are on a public database it becomes almost impossible to remove them (UNESCO 2006) – the information consent form must therefore make provision that it may not be possible to remove data once it has been shared. Erikkson et al. 2005 suggest that the approach to anonymisation and data withdrawal should be modified for BioBanks. Finally, intellectual property rights also play a part here. For instance, image data will attract copyright, and a database of medical data may attract the database right. Research data on living human participants that are shared must be anonymized. This implies that data descriptors should not contain personal information and the data themselves should be made unidentifiable. Taking the example of brain MRI data we must ensure that personal data are removed from the image header (that usually contains information such as name, date of birth, etc) but also be aware that the images themselves can be used to identify individuals: the face can be reconstructed, ears can be used as finger-prints, dental areas are often visible, and special malformation can be so unique that it can point to individuals. Another issue relates to the data availability from open databases. Cross-linkage to the other forms of clinical, environmental, and genealogical information has made it possible to “re-identify” specific individuals as participants in genomic research (Gymrek et al., 2013; Williams, 2013). The same can happen with many forms of data. An alternative is sharing data in a more restrictive way and/or in secure environments (i.e. research safe havens). Furthermore, research into privacy preserving data mining (Agrawal et al 2000, Aggarwal et al. eds. 2008) could allow for some research questions to be approached while protecting individuals privacy. Linked anonymised data vs. unlinked anonymised data: For research which has the potential to generate important clinical information about individual participants, arrangements should be made to allow some party (e.g. data-base manager), to keep a key to re-identifying data sources should the need arise, under stringent privacy protections (McCarty et al., 2008). Conclusions Much of research collected data falls under the data protection act and several key steps can be identified allowing public release in a data repository: 1 – consent form must include information about data protection and data sharing ; 2 – data descriptors must be anonymous, 3 – appropriate data de-identification must take place if required 4 – the potential clinical relevance of the data must be evaluated to choose the type of share data (linked vs. unliked).