Preeclampsia early risk stratification based on a multiparametric machine learning model and routinely collected clinical data
https://doi.org/10.17749/2313-7347/ob.gyn.rep.2025.706
Abstract
Introduction. Preeclampsia (PE) remains one of the leading causes of maternal and perinatal morbidity and mortality, while most cases are still diagnosed at the stage of clinically overt disease. Complex prediction algorithms incorporating biochemical biomarkers and Doppler velocimetry demonstrate high accuracy but are poorly suited for large-scale screening in resource-limited settings.
Aim: to develop, internally and externally validate mathematical models for predicting PE risk at gestational age of ≤ 16 weeks based on routine electronic medical records (EMRs) data and machine learning methods.
Materials and Methods. A retrospective cohort study was conducted using de-identified EMRs of pregnant women from eight regions of the Russian Federation spanning 2010–2025. The analytical dataset included 19,955 visits at gestational age ≤ 16 weeks. The composite outcome comprised PE, eclampsia and HELLP syndrome identified by ICD-10 codes. A broad spectrum of clinical, medical history and anthropometric variables was evaluated as potential predictors. Models (logistic regression, gradient boosting, Random Forest, Extra Trees) were trained with adjustment for class imbalance; feature selection was based on SHAP values (SHapley Additive exPlanations indices). Internal performance was assessed on a held-out test set, and independent external validation was performed on a subsample from healthcare facilities of the Republic of Karelia (n = 918).
Results. The final Extra Trees model including 35 clinically interpretable predictors achieved a ROC-AUC (Receiver Operating Characteristic curve; Area Under Curve) of 0.871 (95 % confidence interval (CI) = 0.811–0.923) and 0.862 (95 % CI = 0.833–0.892) in internal and external validation set, respectively. At a probability threshold of 0.04, sensitivity in the external cohort was 0.886, specificity 0.631, and negative predictive value exceeded 0.99. Probability calibration was moderate (mean absolute calibration error was 0.245–24.5 percentage points). The strongest contributors to PE risk were chronic hypertension, history of PE, blood pressure parameters, antiphospholipid syndrome and diabetes mellitus.
Conclusion. The Extra Trees model developed on routinely collected EMRs data demonstrates acceptable discriminative ability, high sensitivity and very high negative predictive value and may be considered as a screening tool for early PE risk stratification, provided local calibration assessment and further clinical evaluation.
Keywords
About the Authors
A. A. IvshinRussian Federation
Alexander A. Ivshin, МD, PhD.
Scopus Author ID: 610777.
WoS ResearcherID: AAG-1507-2020.
33 Lenin Avenue, Petrozavodsk 185910
N. A. Malyshev
Russian Federation
Nikita A. Malyshev, МD.
WoS ResearcherID: OVY-0768-2025.
33 Lenin Avenue, Petrozavodsk 185910
References
1. Abalos E., Cuesta C., Grosso A.L. et al. Global and regional estimates of preeclampsia and eclampsia: a systematic review. Eur J Obstet Gynecol Reprod Biol. 2013;170(1):1–7. https://doi.org/10.1016/j.ejogrb.2013.05.005.
2. Duley L. The global impact of pre-eclampsia and eclampsia. Semin Perinatol. 2009;33(3):130–7. https://doi.org/10.1053/j.semperi.2009.02.010.
3. Bisson C., Dautel S., Patel E. et al. Preeclampsia pathophysiology and adverse outcomes during pregnancy and postpartum. Front Med. 2023;10:1144170. https://doi.org/10.3389/fmed.2023.1144170.
4. Poon L.C., Shennan A., Hyett J.A. et al. The International Federation of Gynecology and Obstetrics (FIGO) initiative on pre‐eclampsia: A pragmatic guide for first‐trimester screening and prevention. Int J Gynecol Obstet. 2019;145(S1):1–33. https://doi.org/10.1002/ijgo.12802.
5. Gabbay‐Benziv R., Oliveira N., Baschat A.A. Optimal first trimester preeclampsia prediction: a comparison of multimarker algorithm, risk profiles and their sequential application. Prenat Diagn. 2016;36(1):34–9. https://doi.org/10.1002/pd.4707 .
6. De Kat A.C., Hirst J., Woodward M. et al. Prediction models for preeclampsia: A systematic review. Pregnancy Hypertens. 2019;16:48–66. https://doi.org/10.1016/j.preghy.2019.03.005.
7. Henderson J.T., Thompson J.H., Burda B.U., Cantor A. Preeclampsia screening: evidence report and systematic review for the US Preventive Services Task Force. JAMA. 2017;317(16):1668. https://doi.org/10.1001/jama.2016.18315.
8. Myatt L., Redman C.W., Staff A.C. et al. Strategy for standardization of preeclampsia research study design. Hypertension. 2014;63(6):1293–301. https://doi.org/10.1161/HYPERTENSIONAHA.113.02664.
9. Gao Y., Sharma T., Cui Y. Addressing the challenge of biomedical data inequality: an artificial intelligence perspective. Annu Rev Biomed Data Sci. 2023;6(1):153–71. https://doi.org/10.1146/annurev-biodatasci-020722-020704.
10. Li S., Wang Z., Vieira L.A. et al. Improving preeclampsia risk prediction by modeling pregnancy trajectories from routinely collected electronic medical record data. Npj Digit Med. 2022;5(1):68. https://doi.org/10.1038/s41746-022-00612-x.
11. Li Y.-Х., Shen X.-Р., Yang C. et al. Novel electronic health records applied for prediction of pre-eclampsia: Machine-learning algorithms. Pregnancy Hypertens. 2021;26:102–9. https://doi.org/10.1016/j.preghy.2021.10.006.
12. Ranjbar A., Montazeri F., Ghamsari S.R. Preventive Services Task Force Machine learning models for predicting preeclampsia: a systematic review. BMC Pregnancy Childbirth. 2024;24(1):6. https://doi.org/10.1186/s12884-023-06220-1.
13. Ballard H.K., Yang X., Mahadevan A.D. et al. Five-feature models to predict preeclampsia onset time from electronic health record data: development and validation study. J Med Internet Res. 2024;26:e48997. https://doi.org/10.2196/48997.
14. Wang Y., Li B., Zhao Y. Inflammation in preeclampsia: genetic biomarkers, mechanisms, and therapeutic strategies. Front Immunol. 2022;13:883404. https://doi.org/10.3389/fimmu.2022.883404.
15. Feng Y., Lian X., Guo K. et al. A comprehensive analysis of metabolomics and transcriptomics to reveal major metabolic pathways and potential biomarkers of human preeclampsia placenta. Front Genet. 2022;13:1010657. https://doi.org/10.3389/fgene.2022.1010657.
16. North R.A., McCowan L.M.E., Dekker G.A. et al. Clinical risk prediction for pre-eclampsia in nulliparous women: development of model in international prospective cohort. BMJ. 2011;342:d1875. https://doi.org/10.1136/bmj.d1875.
17. Sandström A., Snowden J.M., Bottai M. et al. Routinely collected antenatal data for longitudinal prediction of preeclampsia in nulliparous women: a population-based study. Sci Rep. 2021;11(1):17973. https://doi.org/10.1038/s41598-021-97465-3.
18. Li T., Xu M., Wang Y. et al. Prediction model of preeclampsia using machine learning based methods: a population based cohort study in China. Front Endocrinol. 2024;15:1345573. https://doi.org/10.3389/fendo.2024.1345573.
19. Sufriyana H., Wu Y.W., Su E.C.Y. Artificial intelligence-assisted prediction of preeclampsia: Development and external validation of a nationwide health insurance dataset of the BPJS Kesehatan in Indonesia. EBioMedicine. 2020;54:102710. https://doi.org/10.1016/j.ebiom.2020.102710.
20. Aljameel S.S., Alzahrani M., Almusharraf R. et al. Prediction of preeclampsia using machine learning and deep learning models: a review. Big Data Cogn Comput. 2023;7(1):32. https://doi.org/10.3390/bdcc7010032.
21. Hackelöer M., Schmidt L., Verlohren S. New advances in prediction and surveillance of preeclampsia: role of machine learning approaches and remote monitoring. Arch Gynecol Obstet. 2022;308(6):1663–77. https://doi.org/10.1007/s00404-022-06864-y.
22. Andreichenko A.E., Luchinin A.S., Ivshin A.A. et al. Development and validation of models for predicting overall preeclampsia risk and early-onset preeclampsia risk using machine learning algorithms in the first trimester of pregnancy. [Razrabotka i validatsiya modeley prognozirovaniya obshchego riska preeklampsii i riska ranney preeklampsii s ispol'zovaniem algoritmov mashinnogo obucheniya v pervom trimestre beremennosti]. Akusherstvo i ginekologiya. 2023;(10):94–107. (In Russ.). https://doi.org/10.18565/aig.2023.101.
23. Montgomery-Csobán T., Kavanagh K., Murray P. et al. Machine learning-enabled maternal risk assessment for women with pre-eclampsia (the PIERS-ML model): a modelling study. Lancet Digit Health. 2024;6(4):e238–e250. https://doi.org/10.1016/S2589-7500(23)00267-4.
24. Zhang Y., Sylvester K.G., Jin B. et al. Development of a urine metabolomics biomarker-based prediction model for preeclampsia during early pregnancy. Metabolites. 2023;13(6):715. https://doi.org/10.3390/metabo13060715.
Review
For citations:
Ivshin A.A., Malyshev N.A. Preeclampsia early risk stratification based on a multiparametric machine learning model and routinely collected clinical data. Obstetrics, Gynecology and Reproduction. (In Russ.) https://doi.org/10.17749/2313-7347/ob.gyn.rep.2025.706

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.




































