yardstiq/quantum-benchmarks

bin/plot fails on recently pushed data

Closed this issue · 2 comments

Hi,

I've noticed you pushed the benchmark results in the data/ directory. However, the bin/plot script does not generate the plots. Did I miss some preprocessing steps?

Best regards and stay safe,
Stefan

$ bin/plot 
Traceback (most recent call last):
  File "bin/plot", line 9, in <module>
    labels=['X', 'H', 'T', 'CNOT', 'Toffoli']
  File "/home/stefan/repos/quantum-benchmarks/bin/utils/plot_utils.py", line 89, in parse_data
    gate_data[each_package] = wash_benchmark_data(each_package, labels)
  File "/home/stefan/repos/quantum-benchmarks/bin/utils/plot_utils.py", line 44, in wash_benchmark_data
    with open(find_json(name)) as f:
  File "/home/stefan/repos/quantum-benchmarks/bin/utils/plot_utils.py", line 34, in find_json
    for each in os.listdir(benchmark_path):
NotADirectoryError: [Errno 20] Not a directory: '/home/stefan/repos/quantum-benchmarks/data/yao.csv'

this can be solved by removing three non-directory files in the data/ directory. But it still doesn't work with new error.

Traceback (most recent call last):
  File "bin/plot", line 9, in <module>
    labels=['X', 'H', 'T', 'CNOT', 'Toffoli']
  File "/home/miyoshi/quantum-benchmarks/bin/utils/plot_utils.py", line 89, in parse_data
    gate_data[each_package] = wash_benchmark_data(each_package, labels)
  File "/home/miyoshi/quantum-benchmarks/bin/utils/plot_utils.py", line 57, in wash_benchmark_data
    return pd.DataFrame(data=dd)
  File "/home/miyoshi/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 435, in __init__
    mgr = init_dict(data, index, columns, dtype=dtype)
  File "/home/miyoshi/anaconda3/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 254, in init_dict
    return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "/home/miyoshi/anaconda3/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 64, in arrays_to_mgr
    index = extract_index(arrays)
  File "/home/miyoshi/anaconda3/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 365, in extract_index
    raise ValueError("arrays must all be same length")
ValueError: arrays must all be same length

Both problems can be fixed by editing bin/utils/plot_utils.py and replacing the functions below:

def find_json(name):
    """find the first matchable json benchmark file.
    """
    benchmark_dir = os.path.join(ROOT_PATH, 'data')
    sub_dirs = [f.path for f in os.scandir(benchmark_dir) if f.is_dir()]               # check for subdirs
    if not sub_dirs:
        raise FileNotFoundError('Did not find any directory with in data/')
    elif len(sub_dirs) > 1:
        print('WARNING: Found more than one suitable subdir. Arbitrarily choose {}'.format(sub_dirs[0]))
    benchmark_path = os.path.join(benchmark_dir, sub_dirs[0])
    file_stack = []
    for each in os.listdir(benchmark_path):
        if name in each:
            file_stack.append(each)
    return os.path.join(benchmark_path, file_stack[-1])


def wash_benchmark_data(name, labels):
    """process benchmark data, append `inf` to the data if there is no such data (it means
    timeout during benchmarking usually). Then return a Pandas.DataFrame object.
    """python
    with open(find_json(name)) as f:
        data = json.load(f)

    cols = [each['params']['nqubits'] for each in data['benchmarks'] if each['group'] == labels[0]]
    dd = {'nqubits': cols}
    for lb in labels:
        time_data = [each['stats']['min']*1e9 for each in data['benchmarks'] if each['group'] == lb]
        if len(time_data) is not len(cols):
            time_data.extend([float('inf') for _ in range(len(cols) - len(time_data))])                        # extend instead of append
        dd[lb] = time_data
    return pd.DataFrame(data=dd)

I'm currently integrating my simulator and will include this in the PullRequest.