Early-Stage Diabetes mellitus Risk Prediction And Symptom Association: A Comparative Analysis Using Feature Importance
Main Article Content
Abstract
Early-stage diabetes risk prediction is a critical component of preventive healthcare, with the goal of identifying patients who are at risk of developing diabetes before they have symptoms. This research evaluates multiple machine learning (ML) methods for predicting diabetes risk, including logistic regression, Naive Bayes, random forest, K-Nearest Neighbours (KNN), and decision trees. To train and evaluate these models, we used an upgraded version of the Sylhet Diabetes Hospital Dataset, which had 521 occurrences and 18 attributes. Our analysis includes a variety of parameters, such as each algorithm's predicted accuracy, feature importance ranking across models, association rule mining to identify connections between essential diabetes markers, detailed mathematical foundations, and pseudocode. The results reveal that the Random Forest algorithm outperforms all other approaches, with an accuracy of 97.1153%. Polyuria, polydipsia, and gender are significant predictors across multiple algorithms, according to our findings. Association rule mining reveals strong correlations between these symptoms, particularly in female patients. This multidimensional approach not only provides a robust foundation for early diabetes detection, but it also sheds light on the interplay of risk factors. The findings have the potential to enhance preventative care practices and lead to more targeted screening regimens.