Juan (Wendy) Zhao, PhD

Research Assistant Professor

Machine learning using longitudinal EHR and genetic data to improve cardiovascular disease prediction

This study was among the early works that developed machine learning and deep learning models to predict 10-year CVD risk using longitudinal EHR and genetic data.

Cardiovascular disease (CVD) is the leading cause of morbidity and mortality worldwide. Existing models such as the Framingham risk score, and ACC /AHA pooled equations, are based on a small number of risk factors such as hypertension, cholesterol, age, smoking, and diabetes, which can't capture people with few risk factors. 

Using a large cohort from Vanderbilt's de-identified EHR ( 109, 490 adults), we developed several approaches to model the temporal EHR data. The best models were gradient boosting trees (GBT) and LSTM with temporal/longitudinal values.  We also developed a fusion model that integrated the EHR and 204 genetic variants and improved the prediction accuracy further.

Read Full Paper
Github Source Code

The project was sponsored by: 
  • 18AMTG34280063 (PI: Juan Zhao) American Heart Association, 2018-07-01 to 2020-03-1
  • R01 HL133786-02 (PI: Wei-Qi Wei)  “Exploring Statin Pleiotropic Effects within a Very Large EHR Cohort”, NIH/NHLBI,  04/01/2017 – 02/28/2021 


Learning from longitudinal data in electronic health record and genetic data to improve cardiovascular event prediction

Juan Zhao, QiPing Feng, Patrick Wu, Roxana Lupu, Russel A Wilke, Quinn S Wells, Joshua Denny, Wei-Qi Wei

Scientific Reports, vol. 9, 2019, p. 717