Cardiovascular disease (CVD) is the leading cause of morbidity and mortality worldwide. Existing models such as the Framingham risk score, and ACC /AHA pooled equations, are based on a small number of risk factors such as hypertension, cholesterol, age, smoking, and diabetes, which can't capture people with few risk factors.
Using a large cohort from Vanderbilt's de-identified EHR ( 109, 490 adults), we developed several approaches to model the temporal EHR data. The best models were gradient boosting trees (GBT) and LSTM with temporal/longitudinal values. We also developed a fusion model that integrated the EHR and 204 genetic variants and improved the prediction accuracy further.
Read Full Paper
Github Source Code
The project was sponsored by:
- 18AMTG34280063 (PI: Juan Zhao) American Heart Association, 2018-07-01 to 2020-03-1
- R01 HL133786-02 (PI: Wei-Qi Wei) “Exploring Statin Pleiotropic Effects within a Very Large EHR Cohort”, NIH/NHLBI, 04/01/2017 – 02/28/2021
Juan Zhao, QiPing Feng, Patrick Wu, Roxana Lupu, Russel A Wilke, Quinn S Wells, Joshua Denny, Wei-Qi Wei
Scientific Reports, vol. 9, 2019, p. 717