Implication of analysisof machine learning models for predicting the risk of cardiovascular disease by considering lifestyle factors and featuresselection
Main Article Content
Abstract
Objective:This study aims to conduct a comprehensive survey and analysis of the application of machine learning techniques predicting the risk of cardiovascular diseases by considering the certain lifestyle factor. The objective is to explore the utilization of machine learning algorithms, diverse datasets, feature selection methodologies, modalities (uni or multi-modal), and performance evaluation metrics across CVD research. The study seeks to provide insights into state-of-the-art approaches, identify key challenges, and stimulate further research interest in leveraging machine learning for early detection and efficient management of CVD, thereby contributing to improved healthcare outcomes.
Method: The methodology employed in this study encompasses several key steps to comprehensively analyse the application of machine learning (ML) algorithms in detecting, categorizing, and predicting cardiovascular diseases (CVD). Firstly, diverse datasets related to CVDs collected, covering demographic information, medical history and lifestyle factors. These datasets undergo meticulous pre-processing steps, including handling missing values, outliers, and data normalization, to ensure data quality and consistency. Feature selection techniques recursive feature elimination, and feature importance ranking are then applied to recognize the utmostapplicable features for predicting CVD outcomes. The results are then analysed and interpreted to gain insights into the strengths and weaknesses of each model, feature importance, and generalization capabilities across different datasets.
Findings:The analysis considered several features as relevant to the Indian population due to their coverage of modified lifestyle attributes. These structurescontain BMI, Systolic Blood Pressure, Diastolic Blood Pressure, Smoking, Glucose, and Cholesterol. In the LightGBM model training summary, the dataset consisted of 56,000 instances, evenly split between positive and negative cases, and utilized 8 features. The model achieved a starting score of -0.002357, with 720 total bins used. The accuracy of various classifiers ranged from 51.31% (Perceptron) to 73.18% (Gradient Boosting Classifier), with the Gradient Boosting Classifier performing the best on this dataset. Additionally, LightGBM's automatic choice of row-wise multi-threading slightly impacted overhead.
Novelty: The study emphasizes feature selection and data pre-processing, optimizing predictive analytics for healthcare in India. By focusing on BMI, blood pressure, smoking, glucose, and cholesterol, it advances precision medicine and personalized healthcare. The findings improve disease prediction and management, inspiring further research and innovation. This work highlights the importance of tailored healthcare solutions for diverse populations, enhancing health interventions and patient care.