A Vulnerability Introducing Commit Dataset for Java: An Improved SZZ based Approach

Abstract

In the domain of vulnerability detection from the source code by applying static analysis, the number and quality of available datasets for creating and testing security analysis methods is quite low. To be precise, there are already several public datasets containing vulnerability fixing commits; however, vulnerability introducing commit datasets are scarce, which would be essential for creating and validating just-in-time vulnerability detection approaches. In this paper, we propose an SZZ (an algorithm originally developed to find bug introducing commits) based method with a specific filtering mechanism to create vulnerability introducing commit datasets from vulnerability fixes. The filtering phase involves measuring a relevance score for each vulnerability introducing commit candidates based on commit similarities. We generated a novel Java vulnerability introducing dataset from the existing project-KB repository to demonstrate our algorithm’s capabilities. We also showcase the generated database and the effectiveness of our filtering method through several hand-picked examples from the dataset.

Publication
Proceedings of the 17th International Conference on Software and Data Technologies (ICSOFT), , Pages 68–79

BibTeX:

@InProceedings{AHF22,
    author       = {Aladics, Tamás and Hegedűs, Péter and Ferenc, Rudolf},
    booktitle    = {Proceedings of the 17th International Conference on Software and Data Technologies (ICSOFT)},
    title        = {A Vulnerability Introducing Commit Dataset for Java: An Improved SZZ based Approach},
    year         = {2022},
    month        = jul,
    organization = {INSTICC},
    pages        = {68--79},
    publisher    = {SciTePress},
    doi          = {10.5220/0011275200003266},
    keywords     = {Just-in-Time Vulnerability Detection, Dataset, SZZ, Vulnerability Introducing Commits},
    url          = {https://www.scitepress.org/Link.aspx?doi=10.5220/0011275200003266},
}