A group of researchers used a deep learning algorithm to predict the risk of pancreatic cancer based on disease trajectories. In addition, they established monitoring programmes for early pancreatic cancer diagnosis using actual longitudinal clinical data.

“The promise of AI in medicine is to provide composite, panoramic views of individuals’ medical data; to improve decision making; to avoid errors such as misdiagnosis and unnecessary procedures; to help in the ordering and interpretation of appropriate tests; and to recommend treatment.” - Eric Topol

Pancreatic cancer

Pancreatic cancer is becoming more common, the leading cause of cancer-related fatalities worldwide. Pancreatic cancer is difficult to diagnose due to a lack of awareness of its risk factors. In addition, late detection at advanced or remote metastatic stages complicates treatment, making patient survival extremely rare. At five years, only two to nine per cent of such patients survive. Age is a risk factor for pancreatic cancer, but population-wide screening is impractical due to the high cost and false-positive rate of clinical testing.

Furthermore, data on family history or genetic risk factors for the general population are frequently unavailable. Thus, low-cost pancreatic cancer surveillance systems for the broader community are urgently needed.

Existing solutions

The current study employed real-world longitudinal health records from many patients to identify several people at high risk of pancreatic cancer. First, they used patient records from the Danish National Patient Registry (DNPR) and, later, the United States Veterans Affairs (US-VA) Corporate Data Warehouse (CDW) to apply freshly discovered machine learning (ML) algorithms. The former included clinical data from 8.6 million patients between 1977 and 2018, equivalent to 24,000 pancreatic cancer cases, and the latter included clinical data from three million patients, corresponding to 3,900 pancreatic cancer cases.

Furthermore, the researchers trained and evaluated a wide range of ML models on the sequence of disease codes in the DNPR and US-VA clinical data and the CancerRiskNet prediction of cancer occurrence within incremental time intervals.

Deep learning approach

The team used the three-character International Classification of Diseases (ICD) diagnostic codes to make the predictive models. First, they described "pancreatic cancer patients" as people with at least one code under C25, which means malignant pancreatic neoplasm. Almost 98% of cancer disease codes were correct. Lastly, the researchers marked which codes in a patient's history of diagnoses told them the most about their cancer risk. It helped them come up with an idea for the best monitoring programme.

Furthermore, the researchers used the area under the receiver operating characteristic (AUROC) and relative risk (RR) curves to measure how well the different models learned in the DNPR could predict. Also, they gave the RR scores that ML gave for cancer patients in the high-risk group.

Evaluation

All previous research that used real-world clinical records to predict pancreatic cancer risk yielded promising results, but they did not extract time-sequential longitudinal characteristics from disease histories. Therefore, they tested non-time-sequential models on the DNPR dataset in this work.

Excluding input disease diagnoses from the three, six, and twelve months preceding pancreatic cancer diagnoses reduced the performance of the best models from AUROC of 0.879 to AUROCs of 0.843, 0.829, and 0.827 for three/six/12 months. As a result, around 320 people would have developed pancreatic cancer. While clinicians may have discovered some instances based on known pancreatic cancer risk factors, such as chronic pancreatitis, a significant portion of them, approximately 70, would still be newly diagnosed based on a conservative estimate. 

However, federated learning across different healthcare systems would be required in an ideal scenario for a multi-institutional partnership to achieve a globally relevant set of prediction rules.

Conclusion

The prediction accuracy of machine learning (ML)-based models described in this study could be enhanced by the availability of data beyond disease codes, such as observations documented in clinical notes, laboratory results, and genetic profiles of more individuals, as well as health-related data from wearable devices. Identifying high-risk patients would be necessary for the clinical implementation of an early pancreatic cancer diagnosis. Furthermore, only some patients will be eligible for costly clinical screening and intervention programmes.

Sources of Article

Image source: Unsplash

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE