/EDA_Framework

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

DBEDA - Database Experimental Data Analysis Framework

DBEDA is an experimental data analysis framework designed for database performance monitoring in a Jupyter environment. This framework combines server and client components to collect, visualize, and analyze performance data from integrated databases.

DBEDA Logo

Environment Setup

  • Start the DBEDA server and client using Docker Compose: docker compose up

Server

To set up the server component, follow these steps:

  • Run these commands to set up the server:
    service postgresql start
    cd /root/DBEDA/server
    pip install -r server_requirements.txt
    python3 server.py
    

Client

To set up the client component, follow these steps:

  • Run these commands to set up the client:
    service postgresql start
    cd /root/DBEDA/client
    pip install -r client_requirements.txt
    jupyter lab --allow-root
    

Example Usage

Click DBEDA.ipynb

Register Database Configuration

Register the configuration of the database for collecting performance data:

from client_side import *
config = connect_db(db_type='postgres', host='dbeda-client', database='test_cli', user='postgres', password='postgres', port='5434')
collect_performance_data(config)

Data Visualization

Execute a widget to visualize the collected performance data:

visualize(config)

On the left, you can verify which table the collected performance data is currently stored in.

image

You can specify the performance table to visualize using the Tables widget and set the time interval with the Time Range widget.

In the Task widget, you can select various database performance analysis tasks. For basic performance metric charts, you can choose the 'metrics' task.

To visualize the performance data, use the Data widget to select the data, specify the type, and click the Draw button. The selected chart will then be added below.

image

The overall appearance of the visualization component is as follows:

image

Data Extraction

Extract the desired performance data:

data = query_performance_data(config, table='os_metric', metrics='cpu_percent', task='metrics', recent_time_window='1 day')
df_metric = pd.DataFrame(data['metric'])
df

image

The data collected is displayed in the form of a DataFrame, similar to the image above.

Model Traning and Prediction

Train a model, retrieve the trained model, and make predictions:

response = train(config, train_df, 'load prediction', pipeline='RNN')
get_trained_model(config, 'load prediction')
predicted = predict(config, 'load prediction', metric='tps', path="darts_TCN_20230523_150814.pickle")

Contributing

Contributions to the DBEDA framework are welcome. If you have suggestions or improvements, please feel free to open issues or submit pull requests.