Discrepency between sizes of different versions of NSL-KDD
ghisaac opened this issue · 4 comments
Hi,
Please provide more information. There is different versions of the datasets, many subsets.
Please provide more information, maybe part of the stats etc. So I can assist you.
Certainly, the data set I am referring to is the one found at https://www.unb.ca/cic/datasets/nsl.html. It seems that the number of records in the full training set retrieved from that link is significantly higher than for the full training set (found under the "full -d" folder) in this repository. I suspect that the reason is because you are using a 20% subset of the actual full NSL-KDD training set. If that is that case, could you provide any reasoning as why you chose to do so?
Regards
Aaaw yes, let me provide some context. Full -d doesn't represent the full dataset, it represents the full attack class for the 20% subset. The 20% subset was used for the research and each attack class were split into its individual subsets.
The training sets for the research were done on the 20% subsets, the research papers, and papers that follows has some more information around it, as well as other reference to why. Prefer not to discuss this on a git issue as it's debatable and there's been numerous research around the topic.
Thank you for your help! I have the clarity to move further with my work.
All the best