5Genesis Analytics Module - Containerized

The 5Genesis Analytics module provides methods for analysing and offline learning on the data that is provided by the platform monitoring.

Requirements

Docker >= 18.06.0

Installation

Install and start Docker.
Activate Docker Swarm (or replace this step with alternative deployment options, such as Kubernetes):
1. Type
```
docker system info
```
1. Look for "Swarm: active" or "Swarm: inactive"
2. If "inactive", type the following to set up Docker Swarm:
```
docker swarm init
```

Create a Docker Swarm secret by entering the connection details and credentials for each database connection:

docker secret create analytics_connections - << END
platform_name:      # replace platform_name, e.g. uma
    host: ip        # replace ip, e.g. 192.168.0.1
    port: p         # replace p with the port, e.g. 8080
    user: u         # replace u with the user name, e.g. user1
    password: pw    # replace pw with the password
    databases:
    - db_name1      # replace db_name1 with your database name(s)
    - db_name2      # replace or delete if there is only one database
END

Create a Docker Swarm secret for user authentication. Users can only access experiment IDs that are encrypted with this secret. A master password access is also granted with this secret. Replace "secret123" with the actual secret string.
```
echo secret123 | docker secret create analytics_secret -
```
Clone this repository.
The Analytics Module can be configured to cache the requested data for faster future use. This is disabled by default for compatibility, but can be enabled by setting ENABLE_CACHE to true in the stack file analytics-stack.yaml:
```
services:
    data_handler:
        ...
        environment:
            ENABLE_CACHE: "true"
    ...
```
Build and deploy containers with
```
./Analytics/install.sh
```
Note that it will take some time to query and cache the experiment IDs at the first startup of the containers. Depending on the size of the data in the database, this could take from a few seconds to a few minutes before the containers are ready to use.

Overview

Usage

When the containers are up and running, a graphical user interface will be available at http://localhost:5005/dash. The individual containers are also accessible through REST API as described below.

Data Handler

The Data Handler container is the access point for the analytics algorithm containers to retrieve the data from different sources. Note: Access to this container will likely not be available to the end user through the GUI.

Access the Data Handler at http://localhost:5000.

An API description is available at http://localhost:5000/api and includes the following commands:

List all available datasources: http://localhost:5000/get_datasources
List all available experiments: http://localhost:5000/get_all_experimentIds/datasource
List available experiments for given measurement: http://localhost:5000/get_experimentIds_for_measurement/datasource/measurementId
List available measurements for a given experiment: http://localhost:5000/get_measurements_for_experimentId/datasource/experimentId
Retrieve data from a given experiment: http://localhost:5000/get_data/datasource/experimentId
- Parameters:
  - measurement: e.g. Throughput_Measures (default all available measurements)
  - remove_outliers: zscore or mad (default none)
  - match_series: to synchronize data from multiple measurements (default false)
  - max_lag: time lag for synchronisation (default 1s)
  - limit: any integer to indicate a limit on the returned rows (default none)
  - offset: any integer to indicate the offset for the row limit (default none)
  - additional_clause: any InfluxDB clause (default none)
  - chunked: whether the results are retrieved from the server in chunks (default false)
  - chunk_size: any integer to define the chunk size (default 10000 if chunked is enabled)
Retrieve data from two experiments (e.g. for correlation): http://localhost:5000/get_data/datasource/experimentId1/experimentId2
- Parameters same as above
Clear the data handler's cache: http://localhost:5000/purge_cache

Example output for /get_data/:

{
    "ADB_Resource_Agent": {
        "Available RAM (MB)": {
            "0": 845,
            "1": 803,
            "2": 808,
            ...
        },
        ...
    },
    "Throughput_Measures": {
        "DL - AverageThput (Kbps)": {
            "0": 3617.0005,
            "1": 4436.538269,
            "2": 3900.982222,
            ...
        },
        ...
    }
}

Correlation

The Correlation container enables the user to perform:

a cross-correlation on the fields within the same experiment (resulting in a correlation matrix), or
a correlation of the fields across different experiments (resulting in a list of correlation values).

Access the Correlation module at http://localhost:5001.

An API description is available at http://localhost:5001/api and includes the following commands:

Cross-correlation of fields within the same experiment: http://localhost:5001/correlate/fields/datasource/experimentId
- Parameters:
  - measurement: e.g. Throughput_Measures (default all available measurements)
  - method: any Pandas correlation method (default pearson)
  - remove_outliers: zscore or mad (default none)
  - field: filter for fields, e.g. Throughput
Correlation of fields between different experiments: http://localhost:5001/correlate/experiments/datasource/experimentId1/experimentId2
- Parameters same as above

Example output for /correlate/fields/:

{
    "correlation_matrix": {
        "DL - AverageThput (Kbps)": {
            "SNR": 0.21373483,
            "RSSI": 0.1842762049,
            ...
        },
        ...
    }
}

Example output for /correlate/experiments/:

{
    "correlation_list": {
        "Available RAM (MB)": 0.190006483,
        "DL - AverageThput (Kbps)": -0.0301038513, 
        ...
    }
}

Prediction

The Prediction container enables the user to perform KPI prediction on a specified target KPI.

Access the Prediction module at http://localhost:5002.

An API description is available at http://localhost:5002/api and includes the following commands:

KPI Prediction using different algorithms: http://localhost:5002/train/datasource/algorithm/target
- Algorithm:
  - Linear Regression: linreg
  - Random Forest: rf
  - Support Vector Regression (experimental): svr, linear_svr, nu_svr
- Target: Target variable for prediction, e.g. "DL - AverageThput (Kbps)"
- Parameters:
  - drop_feature: any feature to be ignored for training (default None)
  - experimentid: e.g. 499 (at least one experiment ID is mandatory)
  - measurement: e.g. Throughput_Measures (default all available measurements)
  - normalize: not relevant for some algorithms, e.g. Random Forest or SVR (default False)
  - remove_outliers: zscore or mad (default none)
Download the trained model: http://localhost:5002/model

Example output for /train/:

{
    "coefficients": {
        "DL - StatDTX Ratio": 0.6477855028,
        "DL - PDCCH Ratio": 0.1479913101,
        ...
    },
    "real_predicted_values": {
        "y_pred": {
            "0": 3814.6712453507,
            "1": 4633.0326365561,
            ...
        },
        "y_test": {
            "0": 3741.438577,
            "1": 4772.295,
            ...
        }
    },
    "results": {
        "R2 score": 0.7576147901,
        "Explained variance score": 0.7694582032,
        ...
    }
}

Statistical Analysis

The Statistical Analysis container enables the user to perform KPI statistical validation.

Access the Statistical Analysis module at http://localhost:5003.

An API description is available at http://localhost:5003/api and includes the following commands:

KPIs statistical validation across experiments and measurement tables stored in InfluxDB: http://localhost:5003/statistical_analysis/datasource
- Parameters:
  - experimentid: e.g. 499 (at least one experiment ID is mandatory)
  - measurement: e.g. Throughput_Measures (at least one measurement is mandatory)
  - it: iteration identifier (default= 'iteration')
  - kpi: e.g. 'Delay (ms)' (at least one kpi is mandatory)
  - unit: e.g. ms (default= None)
  - exptype' : [0,1] 0: if the experiments contain several kpi samples per iteration , 1: if the experiments contain one kpi sample per iteration (e.g. Service Creation Time) (default= 0)

Example output for /statistical_analysis/:

{
    "experimentid": {
        "520": {
            "Delay (ms)": {
                "Iteration Statistics": {
                    "0": {
                        "25% Percentile":54.800000000000004,	
                        "5% Percentile":24.259999999999998,
                        ...
                    },
                    "1": {
                        "25% Percentile":54.550000000000004,
                        "5% Percentile":25.61,
                        ...
                    },
                    ...
                },
                "Test Case Statistics": {
                    "25% Percentile": {
                        "Confidence Interval":1.0124404204175417,	
                        "Value":57.058
                    },
                    "5% Percentile": {
                    "Confidence Interval":1.1079808517791605,
                    "Value":24.4246
                    },
                    ...
                }
            }
        }
    }
}

If units inserted, they will be associated to the kpis in the same order of appearance. In the hereunder example only a unit is inserted, thus it will be associated to the first kpi, whilst the latter will be printed with no unit. Note that in this case the kpi "Delay (ms)" already comes with a unit within its name, that's why it's printed.

Example output:

{
    "experimentid": {
        "520": {
            "Delay (ms)": {
                "Iteration Statistics": {
                    "0": {
                        "25% Percentile":54.800000000000004,    
                        "5% Percentile":24.259999999999998,
                        ...
                    },
                    "1": {
                        "25% Percentile":54.550000000000004,
                        "5% Percentile":25.61,
                        ...
                    },
                    ...
                },
                "Test Case Statistics": {
                    "25% Percentile": {
                        "Confidence Interval":1.0124404204175417,   
                        "Value":57.058
                    },
                    "5% Percentile": {
                    "Confidence Interval":1.1079808517791605,
                    "Value":24.4246
                    },
                    ...
                }
            },
            "SNR - (dB)": {
                "Iteration Statistics": {
                    "0": {
                        "25% Percentile":30,    
                        "5% Percentile":30,
                        ...
                    },
                    "1": {
                        "25% Percentile":30,
                        "5% Percentile":30,
                        ...
                    },
                    ...
                },
                "Test Case Statistics": {
                    "25% Percentile": {
                        "Confidence Interval":0,   
                        "Value":30
                    },
                    "5% Percentile": {
                    "Confidence Interval":0,
                    "Value":30
                    },
                    ...
                }
            }

        }
    }
}

Feature Selection

The Feature Selection container enables the user to select the most relevant features according to a specified target KPI.

Access the Feature Selection module at http://localhost:5004.

An API description is available at http://localhost:5004/api and includes the following commands:

Feature Selection using different algorithms: http://localhost:5004/selection/datasource/algorithm/target
- Algorithm:
  - Backward Elimination: backward, backward_elimination
  - Recursive Feature Elimination: RFE, rfe
  - Lasso Regression: LASSO, Lasso, lasso
- Target: Target variable, e.g. "DL - AverageThput (Kbps)"
- Parameters:
  - drop_feature: any feature to be ignored for training (default None)
  - experimentid: e.g. 499 (at least one experiment ID is mandatory)
  - measurement: e.g. Throughput_Measures (default all available measurements)
  - normalize: relevant for RFE (default False)
  - alpha: relevant for Lasso (default 0.1)
  - remove_outliers: zscore or mad (default none)

Example output for /selection/:

{
    "Features - Original": {
        "0": "DL - ACK Count",
        "1": "DL - ACK Ratio",
        ...
    },
    "Features - Selected": {
        "0": "DL - ACK Ratio",
        "1": "DL - Byte Count",
        ...
    },
    "Score": {
        "DL - ACK Ratio": 1428937721.858489,
        "DL - Byte Count": 5003183391.821027,
        ...
    }
}

Visualization

The Visualization container enables the user to visualize the results of the other containers.

Access the Feature Selection module at http://localhost:5005.

The GUI can be accessed at http://localhost:5005/dash/. Note that a user authentication token is required to access experiment results. The token contains the valid experiment IDs encrypted by the 5Genesis Portal. A secret is shared between the Portal and the Analytics to facilitate the exchange of the encrypted authenticated experiment IDs. The setup of this shared secret is described in step 5 in the installation instructions above. This secret also serves as a master password to access all experiment IDs.

With the encryption set up, the Analytics requires a token in the URL to receive the valid experiment IDs (or alternatively the master password): http://localhost:5005/dash?token=ABCDE

5genesis/Analytics