An Automated Defect Prediction Framework using Genetic AlgorithmsA Validation of Empirical Studies
- Murillo-Morera, Juan
- Castro-Herrera, Carlos
- Arroyo, Javier
- Fuentes-Fernandez, Ruben
ISSN: 1137-3601, 1988-3064
Año de publicación: 2015
Volumen: 18
Número: 55
Páginas: 114-137
Tipo: Artículo
Otras publicaciones en: Inteligencia artificial: Revista Iberoamericana de Inteligencia Artificial
Resumen
Today, it is common for software projects to collect measurement data through development processes. With these data, defect prediction software can try to estimate the defect proneness of a software module, with the objective of assisting and guiding software practitioners. With timely and accurate defect predictions, practitioners can focus their limited testing resources on higher risk areas. This paper reports the results of three empirical studies that uses an automated genetic defect prediction framework. This framework generates and compares different learning schemes (preprocessing + attribute selection + learning algorithms) and selects the best one using a genetic algorithm, with the objective to estimate the defect proneness of a software module. The first empirical study is a performance comparison of our framework with the most important framework of the literature. The second empirical study is a performance and runtime comparison between our framework and an exhaustive framework. The third empirical study is a sensitivity analysis. The last empirical study, is our main contribution in this paper. Performance of the software development defect prediction models (using AUC, Area Under the Curve) was validated using NASA-MDP and PROMISE data sets. Seventeen data sets from NASA-MDP (13) and PROMISE (4) projects were analyzed running a NxM-fold cross-validation. A genetic algorithm was used to select the components of the learning schemes automatically, and to assess and report the results. Our results reported similar performance between frameworks. Our framework reported better runtime than exhaustive framework. Finally, we reported the best configuration according to sensitivity analysis.