Plotting: One plot script per benchmark program?

Question

Plotting: One plot script per benchmark program?

pjattke opened this issue 4 years ago · 4 comments

One plot script per benchmark program?
- Idea: Script determines latest workflow run folder, then retrieves all the CSV file for a specific eval program (e.g., cardio).
How should plot looks like?
- This depends on which information we include.
- Examples:
  - Bar plots grouped by encryption_t, computation_t, decryption_t; one bar for each tool.
    - Disadvantage: Total time difficult to compare.
  - Stacked + grouped plot: One bar for each tool, this bar is divided into encryption_t, computation_t, decryption_t.
    - As example see this or this

Answer 1 · 2020-07-29T11:44:58.000Z

I like stacked bars, so that we can easily compare total times across tools while still getting a feeling of where overhead comes from.

Answer 2 · 2020-07-29T11:46:13.000Z

We require 10 runs for each benchmark program. Raw values are saved in the CSV file and median/avg is then to be computed during the plotting phase.

@AlexanderViand has existing matplotlib code from a paper for stacked and grouped bars that can be used as basis for our plots.

Answer 3 · 2020-07-31T13:40:27.000Z

In the S3 directory <timestamp>/plot there should be files plot_<application>.py that define a function plot that takes a list of labels (i..e tool names), a list of pandas dataframes (each tool' *.csv) and optionally a matplotlib Figure object, and return a Figure containing the desired plot.

As an example, here is plot_cardio.py:

from typing import List
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np



def plot(labels: List[str], pandas_dataframes: List[pd.DataFrame], fig=None) -> plt.Figure:
    """

    :param labels:
    :param pandas_dataframes:
    :param fig:
    :return:
    """
    # Save current figure to restore later
    previous_figure = plt.gcf()

    # Set the current figure to fig
    if fig is None:
        fig = plt.figure()
    plt.figure(fig.number)

    # Setup Axis, Title, etc
    N = len(labels)
    plt.title('Runtime for Cardio')
    plt.ylabel('Time (ms)')
    ind = np.arange(N)  # the x locations for the groups
    plt.xticks(ind, labels)
    width = 0.35  # the width of the bars: can also be len(x) sequence

    # Plot Bars
    for i in range(N):
        df = pandas_dataframes[i]
        d1 = df['t_keygen'].mean()
        p1 = plt.bar(ind[i], d1, width, color='red')
        d2 = df['t_input_encryption'][i].mean()
        p2 = plt.bar(ind[i], d2 , width, bottom=d1, color='blue')
        d3 = df['t_computation'][i].mean()
        p3 = plt.bar(ind[i], d3, width, bottom=d1+d2, color='green')
        d4 = df['t_decryption'][i].mean()
        p4 = plt.bar(ind[i], d4, width, bottom=d1+d2+d3, color='cyan')

    # Add Legend
    plt.legend((p4[0], p3[0], p2[0], p1[0]), ('Decryption', 'Computation', 'Encryption', 'Key Generation'))

    # Restore current figure
    plt.figure(previous_figure.number)

    return fig


if __name__ == '__main__':
    print("Testing ploting with cardio example")
    data = [pd.read_csv('s3://sok-repository-eval-benchmarks/20200729_094952/Cingulata/cingulata_cardio.csv')]
    labels = ['Cingulata']
    plot(labels, data)

Answer 4 · 2020-07-31T13:54:56.000Z

There is currently still an issue with the html rendering of the plots on the Visualisation Website.
Apparently, mpld3 does not handle custom labels very well: mpld3/mpld3#360 (comment)