The tutorial will leverage cloud resources that will provide the a common environment for all students.
Requirements:
-
Laptop with WiFi
- We will be using the conference WiFi, please ensure that you can connect prior to the tutorial
-
Web browser - latest version of any will work, preference is towards Firefox or Chrome.
- Who we are
- Connect to Qwiklabs
- Introduction notebook to validate
- Big Data Ecosystem
- Challenges in Big Data today
- Apache Arrow
- GPUs for compute
- The GPU Open Analytics Initiative
- The GPU Data Frame (GDF)
- Python library for GDF (PyGDF)
-
Lab 1: Data Loading and Manipulation
- Traditional interface through Pandas
- Pandas to/from PyGDF
- Column Function and Basic Transforms
- Filtering
-
Student Assignment
- Lab 3: Classification using XGBoost
- Familarize with IoT cyber network data
- Data ingest and feature extraction
- Time binning and preparation for classifiation
- Building XGBoost model
- Evaluating the model via ROC curves and AUC
- Student Assignment:
- Investigation into other time binnings, aggregations, and XGBoost parameters
- Using additional features (quantitative and categorical) in the data to build better models
- Moving beyond connection logs to other log types (e.g., DNS) and building models
- Roadmap
- Scaling out to multi-GPU and multi-node
- Partner Activities
- Comclusion