On the Stability and Applicability of Deep Learning in Fault
Localization
Viktor Csuvik, Roland Aszmann, Árpád
Beszédes, Ferenc Horváth and Tibor
Gyimóthy
Numerous Deep Learning (DL)-based fault localization
(FL) methods are developed with the aim of leveraging the code
coverage matrix and failure vector to identify the connection
between program elements and defects. The imbalanced data on which
these approaches train their models poses a substantial challenge
to the effectiveness of fault localization techniques. This study
explores the stability of fault localization models in deep
learning, specifically, their performance when trained repeatedly
using the same input but varying random initializations. Using the
Defect4J benchmark, we trained deep learning models (MLP, CNN, and
RNN) independently and found that 86 cases resulted in (partly)
consistent rankings among all five models and versions, while 621
exhibited varying outcomes, meaning that 90 % of the produced
ranks were different in subsequent trainings. The models showed
significant variability in ranking results, with maximum ranks
sometimes five times that of the minimum. We also adapted the
churn metric from DL research to evaluate models, confirming their
instability. To improve stability, meta-parameter optimization,
model simplification and resampling has been applied. Although
some of these techniques proved effective, even with the
improvements, the models remained insufficiently stable to produce
reliable results.
Keywords: Spectrum Based
Fault Localization, Deep Learning, Prediction Churn, SBFL, DL.
Back