1st Place solution for the Armis DataHack2019 Challenge.
Haitzikim
Solution presentation
- Data Cleaning
- Feature Engineering
- Hand crafted features by our domain (& meme) expert
- Is this a device you can bring to work?
- Is this PC a router?
- etc.
- Multi Level Aggregated Features
- Device Level
- Netowrk, Device Type, Day of week, Hour Level (creating an hour normalized session dataset)
- Network Graph Features
- Degree (in / out / total)
- Pagerank
- Hand crafted features by our domain (& meme) expert
- Preprocessing
- Ensembling
- Multiple algorithms
- Isolation forests
- GMM
- Elliptic envelope
- Multiple datasets
- Device level
- Netowrk & type & dow & hour normalized dataset
- Multiple algorithms
- Self Supervised Learning
- Anomaly detection confidence as label
- Random forest regression on label
- Permutation importance feature selection (per network !)
- SHAP values explainability
- Memes. Lots of memes!
Ever wondered what would happen if you just plug in that seemingly innocent USB you found laying around? You’re about to find out! In this devices-gone-rogue challenge - should you choose to accept it - you will gain access to traffic data of ~100K devices, and will be tasked with finding the devices that, well, misbehave. This challenge is fully unsupervised - so put your anomaly belt on and get to it!
Model results will be matched against a prelabeled dataset, and AUC (Area Under the Curve) will be calculated on the test set. Also Meme mastery :)
Leader Board can be accessed here https://leaderboard.datahack.org.il/armis