Software Defect Data Pre-Processing Using Enhanced Unified Data Processing Algorithm
Main Article Content
Abstract
Software testing using machine learning involves leveraging machine learning algorithms and techniques to improve various aspects of the software testing process. This study presents an advanced preprocessing framework for enhancing data quality in the PSED Dataset. The Enhanced Unified Data Processing framework consists of three stages: removal of duplicate records using the Firefly Algorithm, handling missing values with an improved KNN algorithm, and Enhanced outlier detection using the Z-score method. The Firefly Algorithm iteratively compares feature vectors to eliminate duplicates, while the improved KNN algorithm employs weighted averaging and mode selection for imputation, with adjusted weighting to reduce outlier influence. Outlier detection is performed using z-scores, offering flexibility in threshold selection. Integration of these techniques ensures robust data preprocessing for reliable software engineering research.