IntelPython/bearysta

Integration with other tools

bibikar opened this issue · 0 comments

It would be useful to support integration with a number of other tools.

TeamCity

It would be useful to have output which allows TeamCity progress reporting.

Hyperfine

https://github.com/sharkdp/hyperfine

Probably trivial. We can just write an aggregator recipe and benchmark config.

airspeed velocity

It would be useful to be able to use airspeed velocity JSON as an input format to the aggregator. From older notes:

I can think of two approaches to generic JSON parsing.

First approach: user specifies JSON schema

Adding a user-specified json parser is a bit more difficult, because the users will have to somehow specify the schema used for the parsing. I have some ideas, but generally json outputs have complex structure (e.g. ibench outputs a map containing the prefix and a list of maps containing the problem and problem size and a list of the actual timings. Here's a simplified example of how that looks like...

{
  "name": "intelpython3",
  "date": "2019-06-20",
  "runs": [
    {
      "name": "Dot",
      "N": 2,
      "times": [
         0.04996657371520996,
         1.2636184692382812e-05,
         3.0994415283203125e-06
      ]
    }
  ]
}

we would need...

  • a way to get an object arbitrarily deep in the hierarchy. We can use list/tuple-based indexing for this, where each element of the list indexes the next inner layer.
  • a way to specify expansion of outer information onto multiple inner data (e.g. mapping "Dot" and "2" onto the three times and creating three rows) with an arbitrary number of layers.
  • certain JSON schemas are supported directly by pandas https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html

Second approach: generic conversion of JSON to dataframe

If we want to convert an entire JSON file to a table, though, we could possibly follow the following approach:
expand everything except the outermost layer by starting with the innermost layer and following the logic

  • if the layer is a scalar, create a 1x1 table with empty string as the column name and the scalar value as the only value.
  • if the layer is a list, create a table of any inner values. If inner values are tables, simply concatenate them.
  • if the layer is a map, attempt to create one row containing each key: value pair. If inner values are tables...
    • rename the table columns to be key.columnname for each columnname (or just key if columnname is the empty string)
    • merge the tables.

Example

e.g. this would follow the steps for transforming a fictitious example into JSON:

(just like how we talk about matrices, nxm table means a table with n rows and m columns)

original JSON:

{
  "name": "intelpython3",
  "date": "2019-06-20",
  "runs": [
    {
      "name": "Dot",
      "N": 2,
      "times": [
         1, 2, 3
      ]
    },
    {
      "name": "Inv",
      "N": 2,
      "times": [
         4, 5, 6
      ]
    }
  ]
}

tabulating the innermost element:

{
  "name": "intelpython3",
  "date": "2019-06-20",
  "runs": [
    {
      "name": "Dot",
      "N": 2,
      "times": <3x1 table of times with empty string as column name>
    },
    {
      "name": "Inv",
      "N": 2,
      "times": <3x1 table of times with empty string as column name>
    }
  ]
}

tabulating the second innermost element:

{
  "name": "intelpython3",
  "date": "2019-06-20",
  "runs": [
    <3x3 table. name=Dot, N=2 for all, times=[1, 2, 3]>,
    <3x3 table. name=Inv, N=2 for all, times=[4, 5, 6]>
  ]
}

tabulating the third innermost element:

{
  "name": "intelpython3",
  "date": "2019-06-20",
  "runs": <6x3 table. name=[Dot, Dot, Dot, Inv, Inv, Inv], N=2, times=[1, 2, 3, 4, 5, 6]>
}

tabulating the fourth innermost element:

<6x5 table. name=intelpython3, date=2019-06-20, runs.name=[Dot, Dot, Dot, Inv, Inv, Inv], runs.N=2 for all, runs.times=[1, 2, 3, 4, 5, 6]>

Implementing this in python is actually pretty easy! All we do is parse the entire json input, then recursively follow this procedure, replacing objects in-place with pandas dataframes.