Include CSV header in data files

Question

Include CSV header in data files

smarr opened this issue 3 years ago · 1 comments

The data files don't have a header line, which makes it a bit a challenge to guess which column is which.

We probably also should document the data format in the docs.

Answer 1 · 2023-07-04T13:25:55.000Z

The content of a data file currently looks something like:

#!/Users/smarr/Projects/ReBench/main.py -R -d rebench.conf progr-rep-mem
# Execution Start: 2023-04-07T17:08:32.416178+00:00
# Environment: {"userName": "smarr", "manualRun": true, "hostName": "A2", "osType": "Darwin", "memory": 103079215104, "denoise": {}, "cpu": "Apple M2 Max", "clockSpeed": 0, "software": [{"name": "kernel", "version": "Darwin Kernel Version 22.3.0"}, {"name": "kernel-release", "version": "22.3.0"}, {"name": "architecture", "version": "arm64"}]}
# Source: {"repoURL": "git@github.com:SOM-st/TruffleSOM.git", "branchOrTag": "master", "commitId": "9aeb6ba2ca872e6844cb4f7f6ba5db094f04204f", "commitMsg": "Blocks with a field read nested in a block are not yet supported (#181)\n\n", "authorName": "Stefan Marr", "committerName": "GitHub", "authorEmail": "git@stefan-marr.de", "committerEmail": "noreply@github.com"}
1	1	131552.000000	kb	MaxRSS	TestGC	TruffleSOM-native-interp-bc	progr-rep-mem	10	1	38		yuria
1	1	540.000000	ms	total	TestGC	TruffleSOM-native-interp-bc	progr-rep-mem	10	1	38		yuria

From the top, the lines encode:

How rebench was invoked
The start time
Data on the environment in which the results were recorded
The status of the source repository in which rebench was run

Afterwards, we have tab-separated lines with values.

From the top of my head, these lines are parsed by the from_str_list methods on the model classes, and they are produced by the corresponding as_str_list methods.

So, after writing out the source info, I would think we want to call get_str_list_header or similar on the relevant model class, which basically does the same as as_str_list but instead gives readable names for the columns.

See for instance:

https://github.com/smarr/ReBench/blob/master/rebench/model/run_id.py#L313-L319

So, I imaging the first line after # Source:
Would read something like invocation, iteration, value, unit, criterion, benchmark, executor, suite, ... (this isn't complete).