/pydaal-getting-started

Introduction and tutorials for using PyDAAL, i.e. the Python API of Intel Data Analytics Acceleration Library

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

This repository consists of various materials introducing PyDAAL (Python API of Intel Data Analytics Acceleration Library) that facilitates Python and Machine Learning practitioners to start off with PyDAAL concepts.

Additionally, helper functions and classes have been provided to aid frequently performed PyDAAL operations.

Volume 1, 2 and 3 in PyDAAL Gentle Introduction Series are available as Jupyter Notebooks. These volumes are designed to provide a quick introduction to essential features of PyDAAL. These Jupyter Notebooks offer a collection of code examples that can be executed in the interactive command shell, and helper functions to automate common PyDAAL functionalities.

How to use?

Install Intel Distribution for Python (IDP) through conda. IDP consists of a large set of commonly used mathematical and statistical Python packages that are optimized for Intel architectures.

  1. Install the latest version of Anaconda.
  • Choose the Python 3.5 version2.
  1. From the shell prompt (on Windows, use Anaconda Prompt), execute these commands:
    conda create --name idp intelpython3_full python=3 -c intel    
    source activate idp (on Linux and OS X)      
    activate idp (on Windows)    

IDP environment is installed with necessary packages and activated to run these notebooks.

More detailed instructions can be found from this online article.

Various stages of machine learning model building process are bundled together to constitute one helper function class. These classes are constructed using PyDAAL’s data management and algorithm libraries to achieve a complete model deployment.

Stages supported by each helper function classes

  1. Training
  2. Prediction
  3. Model Evaluation and Quality Metrics
  4. Trained Model Storage and Portability

More details on all these stages are available in Volume 3.

Currently, helper function classes are provided for

  1. Linear Regression
  2. Ridge Regression
  3. SVM - Binary and Multi-Class classifier
  4. Decision Forest(classification and regression)
  5. Kmeans
  6. PCA
  7. SVM - Binary and Multi-Class classifier
  8. Kmeans

For practice, usage examples with sample datasets are also provided that utilize these helper function classes.

PyDAAL API's have been used to tailor Python modules that support common operations on DAAL's Data Management library.

Import the customUtils module and explore basic utilities provided for data retrieval and manipulation operations on DAAL's Data Management library

  1. getArrayFromNT() : Extracts a numpy array from numeric table
  2. getBlockOfNumericTable(): Slices a block of numeric table with specific range of rows and columns
  3. getBlockOfCols(): Extracts a block of numeric table within specific range of columns
  4. getNumericTableFromCSV(): Reads a CSV file into a numeric table
  5. serialize(): Serializes any input data and saves it into a local variable/disk
  6. deserialize(): Deserailizes serialized data from a local variable/disk

These tutorials are spread across a collection of Jupyter notebooks comprising a theoritical explanation on algorithms and interactive command shell to execute using PyDDAL API.

Tutorials Notebooks

Data files used in the tutorials are in the mldata folder. These data files are downloaded from the UCI Machine Learning Repository.