Utilities for Insights Results Aggregator
These utilities are stored in api_access
subdirectory.
BASH script to retrieve results for multiple clusters (specified in URL) from the Insights Results Aggregator service.
It is needed to provide the correct value for variable ADDRESS
that should points to running Insights Result Aggregator service instance.
BASH script to retrieve results for multiple clusters (specified in request payload) from the Insights Results Aggregator service.
It is needed to provide the correct value for variable ADDRESS
that should points to running Insights Result Aggregator service instance.
BASH script to retrieve results for multiple clusters (specified in URL) from the Smart Proxy service.
It is needed to provide the correct value for variable ADDRESS
that should points to running Smart Proxy service instance.
BASH script to retrieve results for multiple clusters (specified in request payload) from the Smart Proxy service.
It is needed to provide the correct value for variable ADDRESS
that should points to running Smart Proxy service instance.
These utilities are stored in input
subdirectory.
Anonymize input data produced by OCP rules engine.
All input files that ends with '.json' are read by this script and if they contain 'info' key, the value stored under this key is replaced by empty list, because these informations might contain sensitive data. Output file names are in format 's_number.json', ie. the original file name is not preserved as it also might contain sensitive data.
python3 anonymize.py
Converts outputs from OCP rule engine into proper reports.
All input files that with filename 's_*.json' (usually anonymized outputs from OCP rule engine' are converted into proper 'report' that can be:
- Published into Kafka topic
- Stored directly into aggregator database
It is done by inserting organization ID, clusterName and lastChecked attributes and by rearanging output structure. Output files will have following names: 'r_*.json'.
python3 2report.py
This script can be used to fill in the aggregator database in the selected pipeline with data taken from test clusters.
The script performs several operations:
- Decompress input data generated by Insights operator and stored in Ceph/AWS bucket, update directory structure accordingly
- Run Insights OCP rules against all input data
- Anonymize OCP rules results
- Convert OCP rules results into a form compatible with aggregator. These results (JSONs) can be published into Kafka using
produce.sh
(several times if needed)
- https://redhatinsights.github.io/insights-results-aggregator-utils/packages/gen_broken_messages.html
./fill_in_results.sh archive.tar.bz org_id cluster_name
./fill_in_results.sh external-rules-archives-2020-03-31.tar 11789772 5d5892d3-1f74-4ccf-91af-548dfc9767aa
This script read input message (that should be correct) and generates bunch of new messages.
Each generated message is broken in some way so it is possible to use such messages to test how broken messages are handled on aggregator (ie. consumer) side.
Types of input message mutation:
- any item (identified by its key) can be removed
- new items with random key and content can be added
- any item can be replaced by new random content
- https://redhatinsights.github.io/insights-results-aggregator-utils/packages/gen_broken_messages.html
python gen_broken_messages.py input_file.json
This script read input message (that should be correct) and generates bunch of new messages.
Each generated message is broken - it does not contain proper JSON object - to test how broken messages are handled on aggregator (ie. consumer) side.
Types of input message mutation:
- any item (identified by its key) can be removed
- new items with random key and content can be added
- any item can be replaced by new random content
usage: gen_broken_jsons.py [-h] -i INPUT [-o OUTPUT] [-e EXPORTED] [-v] [-s]
[-a] [-d] [-m] [-ap ADD_LINE_PROBABILITY]
[-dp DELETE_LINE_PROBABILITY]
[-mp MUTATE_LINE_PROBABILITY]
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
name of input file
-o OUTPUT, --output OUTPUT
template for output file name (default out_{}.json)
-e EXPORTED, --exported EXPORTED
number of JSONs to be exported (10 by default)
-v, --verbose make it verbose
-s, --shuffle_lines shufffle lines to produce improper JSON
-a, --add_lines add random lines to produce improper JSON
-d, --delete_lines delete randomly selected lines to produce improper
JSON
-m, --mutate_lines mutate lines individually
-ap ADD_LINE_PROBABILITY, --add_line_probability ADD_LINE_PROBABILITY
probability of new line to be added (0-100)
-dp DELETE_LINE_PROBABILITY, --delete_line_probability DELETE_LINE_PROBABILITY
probability of line to be deleted (0-100)
-mp MUTATE_LINE_PROBABILITY, --mutate_line_probability MUTATE_LINE_PROBABILITY
probability of line to be mutate (0-100)
Generator of random payload for testing REST API, message consumers, test frameworks etc.
This source file contains class named RandomPayloadGenerator
that can be reused by other scripts and tools to generate random payloed, useful for testing, implementing fuzzers etc.
This is a helper class that can't be started directly from the command line. Internally it is used by script gen_broken_messages.py
.
These utilities are stored in reports
subdirectory.
Display statistic about rules that really 'hit' problems on clusters.
This script can be used to display statistic about rules that really 'hit' problems on clusters. Can be used against test data or production data if needed.
To run this tool against all files in current directory that contains test data or production data:
python3 stat.py
Analyze data exported from db-writer
database.
This script can be used to analyze data exported from report
table by
the following command typed into PSQL console:
\copy report to 'reports.csv csv
Script displays two tables: 1. org id + cluster name (list of affected clusters) 2. org id + number of affected clusters (usually the only information reguired by management)
Usage:
affected_clusters.py rule_name input_file.csv
Example:
affected_clusters.py ccx_rules_ocp.external.bug_rules.bug_12345678.report report.csv
Analyze data exported from db-writer
database.
List all rules and other interesting informations found in reports.csv. Data are exported into CSV format so it will be possible to include them in spreadsheets.
This script can be used to analyze data exported from `report` table by
the following command typed into PSQL console:
\copy report to 'reports.csv' csv
Howto connect to PSQL console:
psql -h host
Password can be retrieved from OpenShift console, for example from:
ccx-data-pipeline-qa/browse/secrets/ccx-data-pipeline-db
ccx-data-pipeline-prod/browse/secrets/ccx-data-pipeline-db
Creates plot (graph) displaying statistic about the age of rule results.
Creates plot (graph) displaying statistic about the age of rule results.
- https://redhatinsights.github.io/insights-results-aggregator-utils/packages/cluster_results_age.html
python3 cluster_results_age.py input.csv
These utilities are stored in s3
subdirectory.
Script to retrieve timestamp of all objects stored in AWS S3 bucket and export them to CSV.
This script retrieves timestamps of all objects that are stored in AWS S3 bucket and export these timestamps to CSV file. It is possible to specify region (in S3), access key, and secret key.
upload_timestamps.py [-h] -k ACCESS_KEY -s SECRET_KEY [-r REGION]
[-b BUCKET] -o OUTPUT [-m MAX_RECORDS]
optional arguments:
-h, --help show this help message and exit
-k ACCESS_KEY, --access_key ACCESS_KEY
AWS access key ID
-s SECRET_KEY, --secret_key SECRET_KEY
AWS secret access key
-r REGION, --region REGION
AWS region, us-east-1 by default
-b BUCKET, --bucket BUCKET
bucket name, insights-buck-it-openshift by default
-o OUTPUT, --output OUTPUT
output file name
-m MAX_RECORDS, --max_records MAX_RECORDS
max records to export (default=all)
These utilities are stored in monitoring
subdirectory.
Script to retrieve memory and GC statistic from the standard Go metrics. Memory and GC statistic is being exported into CSV file to be further processed.
usage: go_metrics.py [-h] [-u URL] -o OUTPUT [-d DELAY] [-m MAX_RECORDS]
optional arguments:
-h, --help show this help message and exit
-u URL, --url URL URL to get metrics
-o OUTPUT, --output OUTPUT
output file name
-d DELAY, --delay DELAY
Delay in seconds between records
-m MAX_RECORDS, --max_records MAX_RECORDS
max records to export (default=all)
Plot graph with Kafka lags with linear regression line added into plot.
Source CSV file is to be retrieved from Grafana.
kafka_lags.py input_file.csv
kafka_lags.py overall.csv
These utilities are stored in checks
subdirectory.
Simple checker if all JSONs have the correct syntax (not scheme).
Usage:
```text
usage: json_check.py [-h] [-v]
optional arguments:
-h, --help show this help message and exit
-v, --verbose make it verbose
-n, --no-colors disable color output
-d DIRECTORY, --directory DIRECTORY
directory with JSON files to check
Simple checker for OpenAPI specification files.
usage: open_api_check.py [-h] [-v] [-n] [-d DIRECTORY]
optional arguments:
-h, --help show this help message and exit
-v, --verbose make it verbose
-n, --no-colors disable color output
-d DIRECTORY, --directory DIRECTORY
directory OpenAPI JSON file to check
Anonymize aggregator log files by hashing organization ID and cluster ID. This tool works as a standard Unix filter.
anonymize_aggregator_log.py [-h] -s SALT
optional arguments:
-h, --help show this help message and exit
-s SALT, --salt SALT salt for hashing algorithm
anonymize_aggregator_log.py -s foobar < original.log > anonymized.log
Anonymize CCX data pipeline log files by hashing organization ID and cluster ID. This tool works as a standard Unix filter.
anonymize_ccx_pipeline_log.py [-h] -s SALT < input.log > output.log
optional arguments:
-h, --help show this help message and exit
-s SALT, --salt SALT salt for hashing algorithm
anonymize_ccx_pipeline_log.py -s foobar < original.log > anonymized.log
These utilities are stored in anim
subdirectory.
That subdirectory contains tools to generate various animations with Insights Results Aggregator, Insights Content Service, and Insights Results Smart proxy architecture and data or command flows. Theese tools are invoked from command line and don't not accept any command line argument (yet).
Creates animation based on static GIF image + set of programmed rules. That animation displays the data flow for the whole external data pipeline.
Specialized utility used just to create data flow for the whole external data pipeline.
go run anim_external_data_pipeline.go
Creates animation based on static GIF image + set of programmed rules. That animation displays the data flow for Insights Results Aggregator consumer service.
Specialized utility used just to create https://github.com/RedHatInsights/insights-results-aggregator/blob/master/docs/assets/anim_aggregator_consumer.gif
go run anim_aggregator_consumer.go
Creates animation based on static GIF image + set of programmed rules. That animation displays data flow between Insights Results Smart Proxy and other services (internal and external ones).
Specialized utility used just to create https://redhatinsights.github.io/insights-content-service/architecture/architecture.gif
go run anim_smart_proxy.go
Creates animation from static GIF image + set of programmed rules.
Specialized utility used just to create https://redhatinsights.github.io/insights-results-smart-proxy/io-pulling-only.gif animation
go run insights_operator_pull_only.go
Creates animation based on static GIF image + set of programmed rules. That animation displays the data flow from Insights Operator to OCP WebConsole via Prometheus metrics.
Specialized utility used just to create https://redhatinsights.github.io/insights-results-smart-proxy/io-pulling-prometheus-anim.gif animation
go run insights_operator_prometheus.go
Creates animation from static GIF image + set of programmed rules.
Specialized utility used just to create https://redhatinsights.github.io/insights-results-smart-proxy/io-pulling-prometheus-anim.gif animation
go run insights_operator_to_web_console.go
Simple checker of all Python sources in the given directory (usually repository).
This script tries to find all files in current directory and subdirectories with '*.py' extension. Then it checks all those files for any style violations. Each violation is printed and then total errors is displayed as well.
To check all files in current directory and all subdirectories:
python3 run_pycodestyle.py
These utilities are stored in converters
subdirectory.
Converts structured data from JSON format into EDN format.
Converts structured data from JSON format into EDN format. This script is based
on edn_format
Python package, that needs to be installed by using pip
or
pip3
.
python3 json2edn.py input.json > output.edn
Prepares script to cleanup old results from database.
This script can be used to analyze data exported from report
table by
the following command typed into PSQL console:
\copy report to 'reports.csv' csv
Script retrieves all reports older than the specified amount of time represented as days. Then it creates an SQL script that can be run by administrator against selected database.
- https://redhatinsights.github.io/insights-results-aggregator-utils/packages/cleanup_old_results.html
Howto connect to PSQL console:
psql -h host
Password can be retrieved from OpenShift console, for example from: ccx-data-pipeline-qa/browse/secrets/ccx-data-pipeline-db ccx-data-pipeline-prod/browse/secrets/ccx-data-pipeline-db
cleanup_old_results.py offset_in_days input_file.csv > cleanup.sql
create a script to cleanup all records older than 90 days
cleanup_old_results.py 90 report.csv > cleanup.sql