Abstract:
This study aimed to develop and evaluate machine-learning models for predicting metabolic syndrome (MetS) without requiring blood tests, using routinely collected data from 29,499 adult clients (192,121 visits) aged 1865 at the Mae Fah Luang University Medical Center Hospital. Predictor variables included demographic factors (age, sex, occupation, marriage status) and basic physiological measures (body mass index, waist circumference, and blood pressure).
A retrospective cross-sectional analysis of electronic health records was conducted spanning 1 December, 2018 to 31 December, 2022. Data were randomly split into 70% training and 30% testing sets via simple random sampling. Five algorithms, including Artificial Neural Network (ANN), Support Vector Machine, Random Forest (RF), Logistic Regression, and Extreme Gradient Boosting (XGBoost) were implemented in Python. Model validation used 10-fold cross-validation with grid-search hyperparameter tuning. Performance was assessed on the test set using accuracy, precision, recall, F1-score, and ROC-AUC.
The findings indicated that participants with metabolic syndrome exhibited statistically significant differences across several key variablesbody weight, age, systolic blood pressure, diastolic blood pressure, sex, and marital statuscompared with those without metabolic syndrome. XGBoost achieved the highest recall (0.88), indicating the best ability to detect individuals at risk for MetS. RF yielded the highest accuracy (0.89), precision (0.41), and F1-score (0.51), with ROC-AUC 0.91 (tied with ANN), but showed the lowest recall (0.68). Based on its sensitivity, the XGBoost model was implemented as a prototype web application for pilot deployment to support real-world MetS risk screening at MFU MCH.