Data Exploration for PI System

Objectives

The project around PI System must be considered as Data Driven Projects.

The functional team is not fully qualified to test the data delivery by its own. It needs help to challenge the data quality. It needs some Continuous Control Monitoring. It need to accept and integrate the failure!

Process Representation

Prerequeries

Install Anaconda TODO change this to pure python.
conda create --name dataexploration matplotlib pandas. TODO change this to pure python.
An accessible up and running PI System with PI WEB-API.
PI WEB-API should be configured for a basic authentication (username/password).
The conf/credentials.yml has not been pushed for a security reason. Therefore, a conf/credentials.yaml.template is added to copy and rename to conf/credentials.yml with a username/password basic authentication.

Dependencies

PI-Web-API-Client-Python – PI Client for Python
csv - Read CSV files
pandas - Work with data structure like missed Data
numpy - perform calculations over Data

Data Exploration description

This model consists of several steps/scripts.

Get the Data from PI System `pigetdata.py`

This script consists on getting the Data from Asset Framework and PI Data Archive. In order to make our testing fully independent of every environment and also to make our testing rules easy to prepare, we will previously insert some AFElements and PIPoints with their appropriate Data in order to work with them with our models.

Data Preparation `clean.py`

This script cleans and pre process the data from your sensor values,before doing any task you have to format your file from this script. Before executing this script you will have to add date time value on the top row of your file with spaces otherwise it won't work

Decision Tree Model `classify.py`

This script consists of unsupervised decision tree model to classify your time-serie values.

It generates a file named leak.csv with values having leaks.

Generate some missing data `autofill.py`

Pass a csv file with "date time value" format, this script will identify the frequency and then generate the missing rows in a csv file autofill-output.csv so that you can fill the values and merge them

Merge two files together `merger.py`

This script will take two csv file with "date time value" format and add the filled values in the orignal file on the recquired place

Statistics on data received `stats.py`

This script generates some periodic statistics on the data received. The results are put into csv files. Statistics implemented:

Percentage of received data per hour. Therefore, low threshold and a high treshold is calculated regarding of the data received.

msepehr/pi-data-exploration

Data Exploration for PI System

Objectives

Process Representation

Prerequeries

Dependencies

Data Exploration description

Get the Data from PI System pigetdata.py

Data Preparation clean.py

Decision Tree Model classify.py

Generate some missing data autofill.py