Robust handling of inconsistent TabularInput keys
Closed this issue · 1 comments
Currently, CsvOutput
emits a warning if the keys of a TabularInput
change after the first call to logger.log(TabularInput)
. A new key not seen before will be ignored and an old key not presented will be left blank. In other words, CsvOutput
conservatively handles dynamic fieldnames.
This behaviour of CsvOutput
makes it tricky to log performance of Multi- and Meta- ML algorithms, where there are usually per-task fields but not every task is presented in every iteration, resulting in missing of logs for some tasks.
The desired behaviour to handle inconsistent keys should be
- When a new key is encountered
- Expand header with the new key.
- Expand old rows with empty cells for the new key.
- If the value of any key is missing, leave the cell blank.
Introduction
Dowel is a tool that the garage Team uses for logging results from our various Reinforcement learning experiments.
Dowel can be used to log different types of data such as floats or strings. The logs can be logged to stdout (the console), CSV files, and Tensorboard.
You can check out an example of how Dowel is used here. In fact, almost all parts of the Dowel API are used in this example.
The problem
After statistics such as loss have been logged, and a call to logger.dump_all()
is made for the first time, new tabular data can’t be written to a CSV output. This is because currently data cannot be inconsistently logged to CSV, meaning that on every single call to dump_all, the same logger keys must appear. Data that is inconsistently logged will not appear in the CSV output. This is a design flaw that we have been able to work around but affects our workflows.
Your goal is to solve the problem as well as introduce tests into our testing framework in order to verify your solution.
Some General Instructions
- Fork Dowel and install all necessary dependencies.
- Take a look at this toy example which when run exposes the bug and the accompanying issue mentioned above.
- When you have finished writing your solution and tests, upload a PR onto your fork, not onto the upstream repository.
- When you are done email us back with the link to your pull request.
If you have any questions, open an issue in your fork, and tag @avnishn and @zequnyu. Our preferred mode of communication on any questions that you have is through github issues and pull requests, as this is how the Garage team communicates generally. For this reason, we won’t respond to any direct emails with regards to help with your project. We will however respond to any other questions that you have via email (interview scheduling, etc).
Best of luck, and let us know if there are any issues as early on as possible