EHR-based machine learning model yields new marker for CAD risk, prognostication

January 6, 2023

3 min of reading


The NIH funded this study. Forrest reports no relevant financial information. See the study for relevant financial disclosures from all other authors.

We were unable to process your request. Please try again later. If you still have this problem, please contact [email protected]

An AI-derived marker using electronic health records noninvasively quantified plaque burden and mortality risk for adults from two large biobank cohorts, offering an option for more targeted diagnosis of CAD, data show.

The study, the first known research to map the characteristics of CAD on a spectrum, revealed distinct gradations of disease risk, atherosclerosis, and survival that would otherwise be missed with binary case-versus-control schemes, the researchers wrote in Lancet.

Graphic representation of the original quote presented in the article

“CAD and other diseases exist on a spectrum, and each individual will have a mix of risk factors, pathogenic processes and biological changes that determine where they will be on the ‘disease’ spectrum,” Dr. Iain S. Forrest, postdoctoral fellow and student in the Medical Scientist Training Program at the Icahn School of Medicine at Mount Sinai, told Healio. “However, the paradigm for most clinicians today is to break this spectrum into inflexible categories of sick or unstable. This results in missed diagnoses, inappropriate treatment and potentially poor clinical outcomes. We want to see if there is a better way to capture this spectrum of disease and that’s what turned our attention to machine learning.”

In a retrospective, observational study, Forrest and colleagues developed and validated a CAD-predictive machine learning model using 95,935 EHRs and estimated its probabilities as in-silico scores for CAD (ISCAD), ranging from 0 (lowest probability) to 1 (highest probability ), in participants from two large, longitudinal groups of biobanks. Within the two cohorts, 35,749 were from the BioMe Biobank used for model training and validation (median age, 61 years; 41% male; 14% with diagnosed coronary heart disease), and 60,186 were from the UK Biobank used for external testing ( median age, 62 years; 42% men; 14% diagnosed with CAD).

“In both (biobank) cases, we were able to look at their de-identified electronic health records, including the medications they were taking, what diagnoses and codes they had, as well as lab measurements and vital signs,” Forrest said in an interview.

The researchers measured the association of ISCAD with clinical outcomes, including coronary artery stenosis, obstructive CAD, multivessel CAD, all-cause death, and CAD sequelae.

In the validation data set, the model predicted CAD with an area under the receiver operating characteristic curve (AUROC) of 0.95 (95% CI, 0.94-0.95), a sensitivity of 0.94 (95% CI, 0, 94-0.95) and a specificity of 0.82 (95% CI, 0.81-0.83). The prevalence of CAD was 13% in the validation data set, with a negative predictive value (NPV) of 0.93 (95% CI, 0.93-0.93) and a positive predictive value (PPV) of 0.84 (95 %CI, 0.83-0.95). ).

In the retention data set, the model predicted CAD with an AUROC of 0.93 (95% CI, 0.92-0.93), a sensitivity of 0.9 (95% CI, 0.89-0.9), and with a specificity of 0.88 (95% CI, 0.87-0.88). The prevalence of CAD was 16% in the retention data set, with an NPV of 0.89 (95% CI, 0.89-0.89) and a PPV of 0.88 (95% CI, 0.88 -0.88).

For the external testing dataset using UK Biobank data, the model predicted CAD with an AUROC of 0.91 (95% CI, 0.91-0.91), a sensitivity of 0.84 (95% CI, 0.83- 0.84) and a specificity of 0.83 (95 % CI, 0.82-0.83). The prevalence of CAD was 14% in the external trial data set, with an NPV of 0.84 (95% CI, 0.83-0.84) and a PPV of 0.83 (95% CI, 0, 82-0.83).

ISCAD captured CAD risk from known risk factors, pooled cohort equations, and polygenic risk scores. Coronary artery stenosis increased quantitatively with increasing ISCAD quartiles, including risk for obstructive CAD, multivessel CAD, and main coronary artery stenosis, according to the researchers.

HR and prevalence of all-cause death increased gradually across ISCAD deciles. Compared with biobank participants in decile 1, the HR for all-cause death for those in decile 6 was 11 (95% CI, 3.9-31), and the prevalence was 3.1%, while the HR for death from of all causes for those in decile 10 was 56 (95% CI, 20-158), and the prevalence was 11%. The researchers observed a similar trend for recurrent MI.

Additionally, 46% of undiagnosed individuals with high ISCAD (0.99) had clinical evidence of CAD according to the 2014 American College of Cardiology/American Heart Association Task Force guidelines.

“What surprised us was how well this digital biomarker picked up and captured many aspects of the disease, from plaque build-up in a patient’s arteries to mortality and everything in between, including complications like MI and atrial fibrillation,” Forrest told Healio. “It was encouraging that the model was able to capture all these different aspects of the disease.”

Forrest said more research is needed in prospective studies to evaluate the markers’ association in silico with incident CAD events and death, and to test their effectiveness in other populations.

“In this study, we focused on CAD as a proof of concept, but we are working to apply the same approach to other common diseases,” Forrest said. “In the future, we also want to better represent diverse populations, including women and underrepresented ethnic groups.”

For more information:

Dr. Iain S. Forrest, can be reached at [email protected]; Twitter: @iainsforrest.

Leave a Comment

Your email address will not be published. Required fields are marked *