Ewha Medical Journal

"Random forest"

Original articles

[English]

Feature-based ensemble modeling for addressing diabetes data imbalance using the SMOTE, RUS, and random forest methods: a prediction study: Younseo Jang; Ewha Med J 2025;48(2):e32. Published online April 15, 2025; DOI: https://doi.org/10.12771/emj.2025.00353

Purpose
This study developed and evaluated a feature-based ensemble model integrating the synthetic minority oversampling technique (SMOTE) and random undersampling (RUS) methods with a random forest approach to address class imbalance in machine learning for early diabetes detection, aiming to improve predictive performance.
Methods
Using the Scikit-learn diabetes dataset (442 samples, 10 features), we binarized the target variable (diabetes progression) at the 75th percentile and split it 80:20 using stratified sampling. The training set was balanced to a 1:2 minority-to-majority ratio via SMOTE (0.6) and RUS (0.66). A feature-based ensemble model was constructed by training random forest classifiers on 10 two-feature subsets, selected based on feature importance, and combining their outputs using soft voting. Performance was compared against 13 baseline models, using accuracy and area under the curve (AUC) as metrics on the imbalanced test set.
Results
The feature-based ensemble model and balanced random forest both achieved the highest accuracy (0.8764), followed by the fully connected neural network (0.8700). The ensemble model had an excellent AUC (0.9227), while k-nearest neighbors had the lowest accuracy (0.8427). Visualizations confirmed its superior discriminative ability, especially for the minority (high-risk) class, which is a critical factor in medical contexts.
Conclusion
Integrating SMOTE, RUS, and feature-based ensemble learning improved classification performance in imbalanced diabetes datasets by delivering robust accuracy and high recall for the minority class. This approach outperforms traditional resampling techniques and deep learning models, offering a scalable and interpretable solution for early diabetes prediction and potentially other medical applications.

Citations

Citations to this article as recorded by

Leaving behind fond memories, I am stepping away from my role as editor of the Ewha Medical Journal after finalizing this issue's theme
Sun Huh
Ewha Medical Journal.2025; 48(4): e51. CrossRef
Development and validation of an explainable prediction model for schistosomiasis seropositivity: a population-based screening study in Hunan Province, China
Yu Zhou, Ling Tang, Mao Zheng, Benjiao Hu, Yanfeng Gong, Liyun Zhu, Liang Shi, Lei Lin, Xinran Yang, Pin Long, Yue Chen, Qingwu Jiang, Yibiao Zhou
International Journal for Parasitology.2025; : 104766. CrossRef

2,664 View
68 Download
1 Web of Science
2 Crossref

[English]

Improving appendix cancer prediction with SHAP-based feature engineering for machine learning models: a prediction study: Ji Yoon Kim; Ewha Med J 2025;48(2):e31. Published online April 15, 2025; DOI: https://doi.org/10.12771/emj.2025.00297

Purpose
This study aimed to leverage Shapley additive explanation (SHAP)-based feature engineering to predict appendix cancer. Traditional models often lack transparency, hindering clinical adoption. We propose a framework that integrates SHAP for feature selection, construction, and weighting to enhance accuracy and clinical relevance.
Methods
Data from the Kaggle Appendix Cancer Prediction dataset (260,000 samples, 21 features) were used in this prediction study conducted from January through March 2025, in accordance with TRIPOD-AI guidelines. Preprocessing involved label encoding, SMOTE (synthetic minority over-sampling technique) to address class imbalance, and an 80:20 train-test split. Baseline models (random forest, XGBoost, LightGBM) were compared; LightGBM was selected for its superior performance (accuracy=0.8794). SHAP analysis identified key features and guided 3 engineering steps: selection of the top 15 features, construction of interaction-based features (e.g., chronic severity), and feature weighting based on SHAP values. Performance was evaluated using accuracy, precision, recall, and F1-score.
Results
Four LightGBM model configurations were evaluated: baseline (accuracy=0.8794, F1-score=0.8691), feature selection (accuracy=0.8968, F1-score=0.8860), feature construction (accuracy=0.8980, F1-score=0.8872), and feature weighting (accuracy=0.8986, F1-score=0.8877). SHAP-based engineering yielded performance improvements, with feature weighting achieving the highest precision (0.9940). Key features (e.g., red blood cell count and chronic severity) contributed to predictions while maintaining interpretability.
Conclusion
The SHAP-based framework substantially improved the accuracy and transparency of appendix cancer predictions using LightGBM (F1-score=0.8877). This approach bridges the gap between predictive power and clinical interpretability, offering a scalable model for rare disease prediction. Future validation with real-world data is recommended to ensure generalizability.

Citations

Citations to this article as recorded by

Feature engineering and explainable artificial intelligence for state of health estimation of Lithium-ion batteries
Tugba Tetik
Journal of Energy Storage.2026; 144: 119873. CrossRef
Evaluating the limitations of gradient boosting and SHAP in predicting magnetite separation performance in PLIMS
Arif Mustafa, Yoshiyasu Takefuji
Powder Technology.2026; 471: 122108. CrossRef
Uncovering Key Factors of Student Performance in Math: An Explainable Deep Learning Approach Using TIMSS 2019 Data
Abdelamine Elouafi, Ilyas Tammouch, Souad Eddarouich, Raja Touahni
Information.2025; 16(6): 480. CrossRef
Concurrent high-grade appendiceal mucinous neoplasm and adenocarcinoma: a unique case report and literature review
Mohammed N AlAli, Jawad S Alnajjar, Mohamed S Essa, Arwa F Alrasheed, Ruba M Alzuhairi, Nouf A Alromaih, Sadiq M Amer, Mohammed Sbaih
Journal of Surgical Case Reports.2025;[Epub] CrossRef
Infrared window properties of AB₂C₄ (A=Zn; B In, Ga; C Te, Se) materials via machine learning and density functional theory
Changcheng Chen, Chunlian Xiong, Xinhui Zhang, Chunling Zhang, Yue Cheng, Weijun Wang, Wenkang Yu, Xunzhe Zhang, Jinkang Yu, Zhengjun Wang, Xiaoning Guan, Jiangzhou Xie, Yaxin Xu, Gang Liu, Pengfei Lu
Journal of Alloys and Compounds.2025; 1044: 184560. CrossRef
Advanced Computational Modeling and Machine Learning for Risk Stratification, Treatment Optimization, and Prognostic Forecasting in Appendiceal Neoplasms
Jawad S. Alnajjar, Faisal A. Al-Harbi, Ahmed Khalifah Alsaif, Ghaida S. Alabdulaaly, Omar K. Aljubaili, Manal Alquaimi, Arwa F. Alrasheed, Mohammed N. AlAli, Maha A. Alghamdi, Ahmed Y. Azzam
Healthcare.2025; 13(23): 3074. CrossRef