Health and care services

DASSL: A technical infrastructure to support access, sharing, storage and linkage of health data

Covid-19 has highlighted the importance of and accelerated the demand for high-quality health data for policymaking, practice, and research. Ireland has a poor track record in this regard and in a recent OECD report (1) ranked last for secondary use and availability of health datasets. Ireland is also one of only two countries not regularly linking datasets for research, statistics, and monitoring.

Across the Irish health services, barriers to data sharing and linking datasets have included siloed datasets, inconsistent application of existing legislation, the need for new enabling legislation, concerns, and different interpretations over data protection. Added to these barriers, minimal use of unique identifiers and the lack of a formal and secure infrastructure to integrate, link and support remote access to data for secondary purposes, including for research, has led to valuable projects being inordinately delayed or in some cases abandoned.

Internationally, similar barriers have been overcome. To protect individuals’ privacy while driving benefits from routinely collected, statistical and survey data, national Health Data Platforms have been developed, most notably in the UK, Australia, Canada, and Finland. A similar model has been proposed for Ireland by the Health Research Board (HRB)(2); DASSL, or, data access, storage, sharing, linkage. The DASSL model aims to provide a single point-of-access to researchers and data controllers to facilitate linking of health data in a safe and trusted manner, with patient anonymity secured at all times.

The Irish Centre for High-End Computing (ICHEC), along with collaborators from the RCSI, HSE, and TCD, was awarded funding from the HRB to develop the proof-of-concept (PoC) technical infrastructure for DASSL. Hosted by NUI Galway and supported by DFHERIS, ICHEC is Ireland’s national centre for high-performance computing (HPC), providing e-infrastructure, services and expertise to higher education institutions, industry, and the public sector.

Objectives

A key objective of the work ICHEC is undertaking with the PoC is to develop a prototype technical infrastructure for DASSL and test it using synthetic health data. The final report will provide recommendations gathered during the PoC and from key stakeholders which will inform the development, technical infrastructure requirements, operations, and governance of Ireland’s future Health Information Systems. The overall objective of which is to improve healthcare and public health and wellbeing.

The proposed model

Overall, the DASSL model includes several components to facilitate safe and secure access, sharing, storage and linkage of health and related datasets as outlined in Figure 1.

Governance

Access, sharing, storage and linkage of national health data requires a lawful basis, clear security and data protection policies and procedures, and governance boards. While this PoC will only use synthetic data, the national roll out of a solution that processes real health and related datasets will necessitate legislation, significant investment, public consultation, appropriate governance structures and various project approval boards (e.g., Research Ethics Committee approval, declarations from the Health Research Consent Declaration Committee, access requests via a Research Data Governance Board). These processes are under review by the Department of Health as part of a reform of Ireland’s Health Information System.

SAIL Databank (Wales) Use Case (3)
By linking GP care data, emergency hospital admissions, prescriptions and asthma deaths together with geographical and socioeconomic deprivation areas from 2013 to 2017, an asthma study found that people from deprived areas in Wales have worse outcomes and increased risk of death. This was then used to inform new policies to combat inequity.

Stakeholder involvement and engagement

In addition to close engagement with the HRB (the commissioners of this project), other key stakeholders have contributed to the planning and development of the DASSL PoC, including the formulation of use cases. This includes representatives from the Department of Health, the HSE, public and patient representatives, HIQA, researchers, and data controllers. It is clear that ongoing public consultation including a Public Advisory Board will be critical to the success of any model taken forward. Openly sharing of the results of research projects using national data will also be crucial to promoting use of these findings for public trust and enhancing public benefit.

Research Support Unit

The Research Support Unit (RSU) plays a pivotal role in facilitating researchers from the conception of a project idea, support in conducting the research and managing any research output. As the point-of-contact for researchers, the RSU staff require in-depth knowledge of the datasets to assess whether a research project is feasible, prepare linked pseudonymised datasets for researchers (with the data minimisation GDPR principle in mind) and assess any research outputs to ensure privacy is preserved prior to export. The RSU role also includes managing a catalogue of datasets.

Technical operation

A key principle that underpins the operation of the DASSL model is that only the data custodians store (a) personally identifiable information such as names, addresses and (b) the corresponding medical/clinical/health data. They are split at source into Dataset A and Dataset B and sent to the Trusted Third Party (TTP) and the Health Research Data Hub, respectively. Datasets can then be linked, prepared, analysed and any research output vetted by the following components of the system.

Trusted Third Party: where records are linked

The TTP is a trusted team of people or an organisational unit who conduct record linkage using personal data (Dataset A) received from data custodians. Linking individual records between datasets is critical for reassociating a person across their healthcare pathway to produce useful insights, and the establishment of a TTP for this purpose is common practice internationally. Again, the explicit separation of personally identifiable information from corresponding health data ensures that only the data controllers have both sets of information and thus helps ensure privacy. The TTP then shares encrypted linkage keys with the Data Hub.

Health Research Data Hub: where data is prepared

This is a tightly controlled data storage and processing platform to prepare datasets for researchers. It receives the variables of interest to the researcher (Dataset B) that are already pseudonymised (i.e., personally identifiable information is stripped and replaced with a random identifier). Using linkage keys from the TTP, the same individual can be linked across the different pseudonymised datasets. These datasets never store any personally identifiable information and are stored for only as long as required in line with GDPR. Access is highly restricted to operations staff (e.g., the RSU) who need to prepare datasets for researchers.

Safe Haven: where data is analysed

A locked down, secure research environment supports virtual access to the pseudonymised project data by approved researchers. Once a researcher is securely connected to this environment (following a stringent access request and approval process), data is prevented from being imported/exported and outgoing internet access is disabled. The researcher is provided with the required analytical software to process the requested datasets. Once the researchers have completed their analyses, any output that needs to be exported (e.g., for publication) is placed in a folder for output checking before being released.

Output checking

The research findings that the researchers want to export from the Safe Haven are assessed for statistical disclosure control by the RSU. This ensures that the data released does not contain any information that could re-identify individuals.

Outlook

There is a huge demand for a national technical infrastructure to support safe and secure analysis of linked datasets both in Ireland and internationally. Increased momentum of initiatives such as the European Health Data Space and associated EU legislation to support the coordination of international data sharing will also require Ireland to be able to facilitate secondary use of data for public benefit. The DASSL PoC, commissioned by the HRB and delivered by ICHEC will report its findings at a critical time to inform actions to shape a fit-for-purpose Irish health information ecosystem, with a clear policy intent to optimise the use of health and social care data for secondary purposes, and informing the associated governance, legislation and investments required. The ultimate aim is to enable a better, evidence-informed health system and stimulate research and innovation to improve healthcare outcomes and the wellbeing of the population.

T: 01 529 1042
E: info@ichec.ie
W: www.ichec.ie

Show More
Back to top button