Include CSV header in data files
smarr opened this issue · 1 comments
The data files don't have a header line, which makes it a bit a challenge to guess which column is which.
We probably also should document the data format in the docs.
The content of a data file currently looks something like:
#!/Users/smarr/Projects/ReBench/main.py -R -d rebench.conf progr-rep-mem
# Execution Start: 2023-04-07T17:08:32.416178+00:00
# Environment: {"userName": "smarr", "manualRun": true, "hostName": "A2", "osType": "Darwin", "memory": 103079215104, "denoise": {}, "cpu": "Apple M2 Max", "clockSpeed": 0, "software": [{"name": "kernel", "version": "Darwin Kernel Version 22.3.0"}, {"name": "kernel-release", "version": "22.3.0"}, {"name": "architecture", "version": "arm64"}]}
# Source: {"repoURL": "git@github.com:SOM-st/TruffleSOM.git", "branchOrTag": "master", "commitId": "9aeb6ba2ca872e6844cb4f7f6ba5db094f04204f", "commitMsg": "Blocks with a field read nested in a block are not yet supported (#181)\n\n", "authorName": "Stefan Marr", "committerName": "GitHub", "authorEmail": "git@stefan-marr.de", "committerEmail": "noreply@github.com"}
1 1 131552.000000 kb MaxRSS TestGC TruffleSOM-native-interp-bc progr-rep-mem 10 1 38 yuria
1 1 540.000000 ms total TestGC TruffleSOM-native-interp-bc progr-rep-mem 10 1 38 yuria
From the top, the lines encode:
- How rebench was invoked
- The start time
- Data on the environment in which the results were recorded
- The status of the source repository in which rebench was run
Afterwards, we have tab-separated lines with values.
From the top of my head, these lines are parsed by the from_str_list
methods on the model classes, and they are produced by the corresponding as_str_list
methods.
So, after writing out the source info, I would think we want to call get_str_list_header
or similar on the relevant model class, which basically does the same as as_str_list
but instead gives readable names for the columns.
See for instance:
https://github.com/smarr/ReBench/blob/master/rebench/model/run_id.py#L313-L319
So, I imaging the first line after # Source:
Would read something like invocation, iteration, value, unit, criterion, benchmark, executor, suite, ...
(this isn't complete).