On the Stability and Applicability of Deep Learning in Fault Localization

Viktor Csuvik, Roland Aszmann, Árpád Beszédes, Ferenc Horváth and Tibor Gyimóthy 
Numerous Deep Learning (DL)-based fault localization (FL) methods are developed with the aim of leveraging the code coverage matrix and failure vector to identify the connection between program elements and defects. The imbalanced data on which these approaches train their models poses a substantial challenge to the effectiveness of fault localization techniques. This study explores the stability of fault localization models in deep learning, specifically, their performance when trained repeatedly using the same input but varying random initializations. Using the Defect4J benchmark, we trained deep learning models (MLP, CNN, and RNN) independently and found that 86 cases resulted in (partly) consistent rankings among all five models and versions, while 621 exhibited varying outcomes, meaning that 90 % of the produced ranks were different in subsequent trainings. The models showed significant variability in ranking results, with maximum ranks sometimes five times that of the minimum. We also adapted the churn metric from DL research to evaluate models, confirming their instability. To improve stability, meta-parameter optimization, model simplification and resampling has been applied. Although some of these techniques proved effective, even with the improvements, the models remained insufficiently stable to produce reliable results.

Keywords: Spectrum Based Fault Localization, Deep Learning, Prediction Churn, SBFL, DL.
Back