/machine-learning-netflow

Ipython notebook that illustrates effectiveness of machine learning algorithms in anomaly detection of netflow data (inbound/outbound DDoS, etc...)

Primary LanguageJupyter NotebookGNU General Public License v2.0GPL-2.0

Use of machine learning for anomaly detection in netflow data

This notebook can be viewed on github.

A readable version of this ipython notebook can also be found here.

Notes

I'm not a data scientist and I'm sure that this process contains errors and inaccuracies. One of I'm aware of is that I've used euclidean distance calculation on heterogeneous features. This is formally incorrect even if classification results are consistent.

If you find other errors feels free to report them with isses or pull requests.

I've no longer access to any netflow data collector. I'd like to develop a service (and open source it ;-)) that applies ml alghoritms to this data to automatically spot anomalies. If someone is interested and has a collector with nfdump installed, which I can have ssh access to, please contact me!