InitRoot/NSLKDD-Dataset

Discrepency between sizes of different versions of NSL-KDD

ghisaac opened this issue · 4 comments

Hello!

I would like to inquire about the discrepancy in the number of records between the version of the dataset found here and the one found at . It seems that the UNB version has significantly more records, and I would like to figure out the reason why that is.

Thanks in advance!

Hi,

Please provide more information. There is different versions of the datasets, many subsets.
Please provide more information, maybe part of the stats etc. So I can assist you.

Certainly, the data set I am referring to is the one found at https://www.unb.ca/cic/datasets/nsl.html. It seems that the number of records in the full training set retrieved from that link is significantly higher than for the full training set (found under the "full -d" folder) in this repository. I suspect that the reason is because you are using a 20% subset of the actual full NSL-KDD training set. If that is that case, could you provide any reasoning as why you chose to do so?

Regards

Aaaw yes, let me provide some context. Full -d doesn't represent the full dataset, it represents the full attack class for the 20% subset. The 20% subset was used for the research and each attack class were split into its individual subsets.

The training sets for the research were done on the 20% subsets, the research papers, and papers that follows has some more information around it, as well as other reference to why. Prefer not to discuss this on a git issue as it's debatable and there's been numerous research around the topic.

Thank you for your help! I have the clarity to move further with my work.

All the best