ISSN: 2182-2069 (printed) / ISSN: 2182-2077 (online)
Predictive Model for Healthcare Software Defect Severity using Vote Ensemble Learning and Natural Language Processing
Software defects are frequent occurrences, which can lead to various problems. These defects are more devastating in healthcare software. Defects in healthcare software have a higher tendency to claim lives directly than the usual manual healthcare procedures. Defects in healthcare software may cause prolonged treatment of patients, aggravating patient recuperation periods and usually leading to direct monetary wastage and healthcare resources. As software systems continue to grow in size and complexity, the likelihood of defects increases. Even with careful planning, thorough documentation, and rigorous process control during development, defects can arise. Moreover, having many software development tasks carried out by individuals, the differences in approaches and actions can give rise to various defects throughout the development process, potentially resulting in disappointments for users in the near future. Unfortunately, existing methods for software defect prediction often struggle with accuracy issues such as underfitting and overfitting, among other imbalances. Moreover, traditional software defect prediction methods often rely on software metrics, such as Line of Code and Cyclomatic Complexity, which may fail to accurately capture program syntax and semantics. This study contributes to software defect severity prediction by introducing a hybridized approach that leverages the power of vote ensemble learning and natural language processing. With the machine learning performance metrics, including accuracy, precision, and other relevant measures, previously proposed model for software defect severity recorded a weighted average accuracy and precision of 0.84 and 0.85, respectively. However, our proposed model demonstrated superior performance, with accuracy and precision scores of 0.9890 and 0.9891, respectively. These results highlight the effectiveness of the feature engineering techniques employed, such as synthetic minority oversampling, which aid in better understanding and capturing the severity patterns of software defects. Moreover, the utilization of robust independent variables derived from the word ‘embedding approach’ contributed to the improved accuracy and precision of the predictive model. These findings emphasize the potential of this approach for enhancing software quality assurance processes by accurately predicting the severity levels of defects.