For more info on the arguments just call the binary with --help
it has a description of the script and all the arguments.
To run the tests just run:
$ docker-compose up
$ pytest -s
This will start a new self-contained influx db and will popolate it with data and test the predictions.
$ ./bin/check_time --db-type="telegraf" --measurement="check_time_measurement_telegraf" --filter="host = 'host01' AND service = 'service01'" --kpi-column="kpi" --target-kpi="disk_util" --value-column="value" --value-type="usage" --max-column="max" --verbosity="info" --window="1000s" --debug-plot="debug/debug_telegraf.png" --warning-threshold="300s" --critical-threshold="400s"
INFO:db_adapter.py:__init__:11:Conneting to the DB on [localhost:8086] for the database [test_db]
INFO:check_time_main.py:check_time:54:Retrieving the data from the db
INFO:db_adapter.py:query:15:Executing query:
SELECT time, value as value
FROM "check_time_measurement_telegraf"
WHERE (
time > (now() - 1000s)
AND
kpi = 'cpu_util'
AND host = 'host01' AND service = 'service01'
)
INFO:db_adapter.py:query:21:Got 1 points
INFO:check_time_main.py:check_time:59:Got 600 points
INFO:predict_time_left.py:predict_time_left:33:The coefficents predicted are m:'0.001001669449019937' q:'1.8528845124876625e-11' with score:'1.0'
INFO:predict_time_left.py:predict_time_left:75:The predicted time left is 399.33333337649526 seconds with a score of 100.0%
INFO:check_time_cli.py:check_time_cli:202:Warning threshold 500.0
INFO:check_time_cli.py:check_time_cli:203:Critical threshold 600.0
INFO:check_time_cli.py:check_time_cli:217:Sucessfull exit
OK: 399.33333337649526 6m39.33s (100.00%)
The result can also be checked by the exit code:
- 0 -> OK
- 1 -> Warn
- 2 -> Critical
Also, all the logging is done in stderr
so the last line will be the only thing in stdout
.
You can also specify a path for the default db settings. db_settings.json
and .tests/test_db_settings.json
are examples
of the format needed.
$ ./bin/anomaly_detection --input-database="test_db" --output-database="test_db" --input-measurement="anomaly_measurement_telegraf" --output-measurement="anomaly_measurement_telegraf_ml" --selectors="host,service,kpi" --field="value" --training-timeframe="4w" --window="100s" --warning="0.9" --anomaly="0.95" --verbosity="info"
INFO:db_adapter.py:__init__:11:Conneting to the DB on [localhost:8086] for the database [test_db]
INFO:db_adapter.py:get_selectors_combinations:26:Finding all the combinations of the selectors fields
INFO:db_adapter.py:query:15:Executing query:
SELECT DISTINCT(host) as selector
FROM (
SELECT *
FROM anomaly_measurement_telegraf
WHERE (
time > (now() - 100s)
)
)
INFO:db_adapter.py:query:21:Got 1 points
INFO:db_adapter.py:query:15:Executing query:
SELECT DISTINCT(service) as selector
FROM (
SELECT *
FROM anomaly_measurement_telegraf
WHERE (
time > (now() - 100s)
)
)
INFO:db_adapter.py:query:21:Got 1 points
INFO:db_adapter.py:query:15:Executing query:
SELECT DISTINCT(kpi) as selector
FROM (
SELECT *
FROM anomaly_measurement_telegraf
WHERE (
time > (now() - 100s)
)
)
INFO:db_adapter.py:query:21:Got 1 points
INFO:anomaly_detection_main.py:anomaly_detection:68:There are 1 unique combinations of selectors.
INFO:anomaly_detection_main.py:anomaly_detection:71:Analyzing the selector group: {'host': 'host01', 'service': 'service01', 'kpi': 'disk_util'}
INFO:db_adapter.py:query:15:Executing query:
SELECT value as value
FROM anomaly_measurement_telegraf
WHERE (
time > (now() - 4w)
AND host = 'host01' AND service = 'service01' AND kpi = 'disk_util'
)
INFO:db_adapter.py:query:21:Got 1 points
INFO:anomaly_detection_main.py:anomaly_detection:73:Computed training data:
{('Sunday', 8): {'warning': 0.1089461827284105, 'anomaly': 0.14387984981226534}, ('Sunday', 9): {'warning': 0.16010012515644556, 'anomaly': 0.21819274092615767}, ('Sunday', 10): {'warning': 0.19846808510638297, 'anomaly': 0.26138673341677093}, ('Sunday', 11): {'warning': 0.28001802252816027, 'anomaly': 0.3178423028785982}, ('Sunday', 12): {'warning': 0.1853376720901127, 'anomaly': 0.2444545682102627}, ('Tuesday', 8): {'warning': 0.16190237797246557, 'anomaly': 0.22798498122653316}, ('Tuesday', 9): {'warning': 0.22798498122653316, 'anomaly': 0.28047058823529414}}
INFO:db_adapter.py:query:15:Executing query:
SELECT value as value
FROM anomaly_measurement_telegraf
WHERE (
time > (now() - 100s)
AND host = 'host01' AND service = 'service01' AND kpi = 'disk_util'
)
INFO:db_adapter.py:query:21:Got 1 points
INFO:anomaly_detection_main.py:anomaly_detection:76:Classifying the data
INFO:anomaly_detection_main.py:anomaly_detection:92:An example of the classified data is:
{'measurement': 'anomaly_measurement_telegraf_ml', 'time': '2021-09-07T08:13:42.372610Z', 'fields': {'warning': 0, 'anomaly': 0, 'value': -0.0, 'warn_threshould': 0.16190237797246557, 'anom_threshold': 0.22798498122653316}, 'tags': {'host': 'host01', 'service': 'service01', 'kpi': 'disk_util'}}
INFO:anomaly_detection_main.py:anomaly_detection:109:Writing the classified data to the db `test_db`
INFO:anomaly_detection_main.py:anomaly_detection:117:Success, wrote the data to the db `test_db`
This will write data in the format:
{
"measurement": "anomaly_measurement_telegraf_ml",
"time": "2021-09-07T08:13:42.372610Z",
"tags": {
"host": "host01",
"service": "service01",
"kpi": "disk_util"
},
"fields": {
"warning": 0,
"anomaly": 0,
"value": -0.0,
"warn_threshould": 0.16190237797246557,
"anom_threshold": 0.22798498122653316
}
}