ymirsky/VulChecker

some questions about the Juliet dataset

Opened this issue · 0 comments

image
Hello, it seems that the paper did not provide a detailed explanation of how the Juliet dataset was constructed. For example, CWE121, which consists of 4944 samples, is it composed of executable code fragments containing CWE (fragments may originate from the same project)? Or rather, due to the difficulty in collecting code snippets containing CWE, it was constructed using some form of dataset generation?