The collection, analysis and use of health data from sources such as clinical trials, laboratory results and medical records, is the bedrock of medical research. As with all AI systems, the concerns of privacy loom large when it comes to the usage of health data for training AI algorithms.

The concerns are well justified, for once an individual’s medical history is exposed, it cannot be replaced in the same way as a new credit card can be obtained after a breach. So, balancing these two ideas, really embracing the possibility of AI and machine learning while also taking special care of patients’ safety is a unique set of challenges in this space.

What comprises health data?

The development of a successful AI system for use in health care relies on high-quality data for both training the algorithm and validating the algorithmic model. The data that can be used for this purpose, collectively known as “biomedical big data”, has expanded dramatically over the past two decades. It now includes massive quantities of personal data about individuals from many sources, including genomic data, radiological images, medical records and non-health data converted into health data. 

The origins of such such data vary greatly, too, from standard sources, such as health services, public health, research, to further sources, such as environmental, lifestyle, socioeconomic, behavioural and social. WHO has defined the health data ecosystem in this figure:

Thus, there are many more sources of health data, entities that wish to make use of such data and commercial and non-commercial applications.

Challenges for researchers

While much of AI deployment in developing countries such as India is hindered due to low digitization, low ecosystem maturity and lack of skilled talent, implementation is marred by data security challenges, as per an EY-NASSCOM report. 

As per Prof Vinod PK of IIIT Hyderabad, access to good quality data remains the biggest roadblock for AI researchers. His team has done phenomenal work during the pandemic, building AI algorithms to predict mortality of Covid-19 patients. “Getting an electronic health record of the patient is still not easy in the Indian setting compared to Western countries. Therefore, getting to a point where we systematically collect those data and then make it accessible after anonymisation is still the bottleneck that we feel we have to break if we want to create impact with AI or healthcare informatics," he said to INDIAai.

Data anonymisation is the answer

Data anonymisation is a type of information sanitisation which removes personally identifiable information (PII) from data sets to keep the identity of the subjects a secret. To protect these data, the material usually undergoes anonymisation and pseudonymisation, but such safeguards have often proven inadequate in terms of protecting patients’ health data.

In May this year, an interdisciplinary team at the Technical University of Munich (TUM), Imperial College London, and the non-profit OpenMined have developed a unique combination of AI-based diagnostic processes for radiological image data that safeguards data privacy. The technology has now been used for the first time in an algorithm that identifies pneumonia in x-ray images of children.

Further, in October 2021, Secure AI Labs (SAIL), founded by MIT alumna Anne Kim and MIT Professor Manolis Kellis, has found a way to anonymise data for AI researchers. SAIL’s platform can also combine data from multiple sources, creating rich insights that fuel more effective algorithms. SAIL sends encrypted algorithms to the hospital servers where the datasets reside in a process called federated learning. The algorithms crunch the data locally ensuring that no one — not the researchers, the data owners, or even SAIL —has access to the models or the datasets.

Want to publish your content?

Get Published Icon
ALSO EXPLORE