Labelled data
Opened this issue · 2 comments
From what I can see, the data provided here does not contain the manual labelling for which functions are vulnerable and not. From your paper:
"Step 1: We first split the vulnerable files as a set of functions and removed header files as well as externally defined global variables. Step 2: If there were one or more additions or deletions in a function’s diff file, the label of this function was “1”, and otherwise “0”."
While I understand the code can be messy and hence difficult to share, the labels should not be difficult to share, right?
All of the dataset we released is vulnerable. Due to some minor issues, some of them are at the file-level rather than function-level described in our paper. We have noticed such problems and will update them in the near future.
Any updates on this? I think the manually labelled data is a great resource you've collected! But since it is all at the file-level (from what I've checked), it's not particularly usable at the moment.