Discrepency between sizes of different versions of NSL-KDD

Question

Discrepency between sizes of different versions of NSL-KDD

ghisaac opened this issue 4 years ago · 4 comments

Hello!

I would like to inquire about the discrepancy in the number of records between the version of the dataset found here and the one found at . It seems that the UNB version has significantly more records, and I would like to figure out the reason why that is.

Thanks in advance!

Answer 1 · 2020-06-22T14:35:11.000Z

Hi,

Please provide more information. There is different versions of the datasets, many subsets.
Please provide more information, maybe part of the stats etc. So I can assist you.

Answer 2 · 2020-06-22T14:42:20.000Z

Certainly, the data set I am referring to is the one found at https://www.unb.ca/cic/datasets/nsl.html. It seems that the number of records in the full training set retrieved from that link is significantly higher than for the full training set (found under the "full -d" folder) in this repository. I suspect that the reason is because you are using a 20% subset of the actual full NSL-KDD training set. If that is that case, could you provide any reasoning as why you chose to do so?

Regards

Answer 3 · 2020-06-22T14:47:43.000Z

Aaaw yes, let me provide some context. Full -d doesn't represent the full dataset, it represents the full attack class for the 20% subset. The 20% subset was used for the research and each attack class were split into its individual subsets.

The training sets for the research were done on the 20% subsets, the research papers, and papers that follows has some more information around it, as well as other reference to why. Prefer not to discuss this on a git issue as it's debatable and there's been numerous research around the topic.

Answer 4 · 2020-06-22T15:16:25.000Z

Thank you for your help! I have the clarity to move further with my work.

All the best