This notebook shows the acceleration one can gain by using GPUs with XGBoost in RAPIDS.
CuML Notebooks
The cuML notebooks showcase how to use the machine learning algorithms implemented in cuML along with the advantages of using cuML over scikit-learn. These notebooks compare the time required and the performance of the algorithms. Below are a list of such algorithms:
This notebook includes code examples of lasso and elastic net models. These models are placed together so a comparison between the two can also be made in addition to their sklearn equivalent.
This notebook showcases principal component analysis (PCA) algorithm where the model can be used for prediction (using fit_transform) as well as converting the transformed data into the original dataset (using inverse_transform).
This notebook showcases truncated singular value decomposition (tsvd) algorithm which like PCA performs both prediction and transformation of the converted dataset into the original data using fit_transform and inverse_transform functions respectively
The uniform manifold approximation & projection algorithm is compared with the original author's equivalent non-GPU \Python implementation using fit and transform functions
Demonstration of cuML uniform manifold approximation & projection algorithm's supervised approach against mortgage dataset and comparison of results against the original author's equivalent non-GPU \Python implementation.
Demostration of UMAP supervised training. Uses a set of labels to perform supervised dimensionality reduction. UMAP can also be trained on datasets with incomplete labels, by using a label of "-1" for unlabeled samples.
This notebook showcases two special methods where cuDF goes beyond the Pandas library: apply_rows and apply_chunk functions. They utilized the Numba library to accelerate the data transformation via GPU in parallel.
This notebook showcases how to use Numba CUDA to accelerate cuDF data transformation and how to step by step accelerate it using CUDA programming tricks
Demonstrate of using the renumbering features to assigned new vertex IDs to the test graph. This is useful for when the data sets is non-contiguous or not integer values
Demonstration of using cuGraph to identify clusters in a test graph using Spectral Clustering using both the (A) Balance Cut and (B) the Modularity Maximization quality metrics
Demonstration of how to use DBSCAN - a popular clustering algorithm - and how to use the GPU accelerated implementation of this algorithm in RAPIDS.
Utils Scripts
Folder
Script Title
Description
Utils
start-jupyter.sh
starts a JupyterLab environment for interacting with, and running, notebooks
Utils
stop-jupyter.sh
identifies all process IDs associated with Jupyter and kills them
Utils
dask-cluster.py
launches a configured Dask cluster (a set of nodes) for use within a notebook
Utils
dask-setup.sh
a low-level script for constructing a set of Dask workers on a single node
Utils
split-data-mortgage.sh
splits mortgage data files into smaller parts, and saves them for use with the mortgage notebook
Documentation (WIP)
Folder
Document Title
Description
Docs
ngc-readme
Docs
dockerhub-readme
Additional Information
The cuml folder also includes a small subset of the Mortgage Dataset used in the notebooks and the full image set from the Fashion MNIST dataset.
utils: contains a set of useful scripts for interacting with RAPIDS
For additional, community driven notebooks, which will include our blogs, tutorials, workflows, and more intricate examples, please see the Notebooks Extended Repo