Assessing Ensemble Learning Techniques in Bug Prediction

Abstract

The application of ensemble learning techniques is continuously increasing, since they have proven to be superior over traditional machine learning techniques in various domains. These algorithms could be employed for bug prediction purposes as well. Existing studies investigated the performance of ensemble learning techniques only for PROMISE and the NASA MDP public datasets; however, it is important to evaluate the ensemble learning techniques on additional public datasets in order to test the generalizability of the techniques. We investigated the performance of the two most widely-used ensemble learning techniques AdaBoost and Bagging on the Unified Bug Dataset, which encapsulates 3 class level public bug datasets in a uniformed format with a common set of software product metrics used as predictors. Additionally, we investigated the effect of using 3 different resampling techniques on the dataset. Finally, we studied the performance of using Decision Tree and Naïve Bayes as the weak learners in the ensemble learning. We also fine tuned the parameters of the weak learners to have the best possible end results.

We experienced that AdaBoost with Decision Tree weak learner outperformed other configurations. We could achieve 54.61 F-measure value (81.96% Accuracy, 50.92% Precision, 58.90% Recall) with the configuration of 300 estimators and 0.05 learning rate. Based on the needs, one can apply RUS resampling to get a recall value up to 75.14% (of course losing precision at the same time).

Publication
Proceedings of the 21th International Conference on Computational Science and Its Applications (ICCSA 2021), Cagliari, Italy, Pages 368–381

BibTeX:

@InProceedings{SVT21,
    author    = {Szamosvölgyi, Zsolt János and Váradi, Endre Tamás and Tóth, Zoltán and Jász, Judit and Ferenc, Rudolf},
    booktitle = {Proceedings of the 21th International Conference on Computational Science and Its Applications (ICCSA 2021)},
    title     = {Assessing Ensemble Learning Techniques in Bug Prediction},
    year      = {2021},
    address   = {Cagliari, Italy},
    month     = sep,
    pages     = {368--381},
    publisher = {Springer International Publishing},
    doi       = {10.1007/978-3-030-87007-2_26},
    keywords  = {AdaBoost, Bug prediction, Resampling, Unified bug dataset},
    url       = {https://link.springer.com/chapter/10.1007/978-3-030-87007-2_26},
}