Better Documentation of Logging/Analysis

Question

Better Documentation of Logging/Analysis

batu opened this issue 4 years ago · 2 comments

First off thank you for this library!

I wanted to ask for your help in understanding the analysis and logging of the training.

During training a lot of information is dumped:

Trial 0 session 3 reinforce_cartpole_t0_s3 [eval_df metrics] final_return_ma: 167.6  strength: 145.74  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 2.22874e-05  training_efficiency: 0.000511296  stability: 0.935119

Among these several are new to me in the specific RL context. What is strength and stability, and how is sample_efficiency and training_efficiency calculated?

In the documentation, the metrics explanations are rather brief, describing them as "self-explanatory." And in the codebase itself the comments didn't seem very accessible to me.

Would it be possible for you to give you an overview of these terms? Or possibly point me to a resource that explains these, couldn't find them in your book.

Additionally, If you can point me to the part where logging formatting is done, I can spend some time getting it to be formatted a bit better. The single line dump currently is hard to read.

Thank you very much!

Answer 1 · 2020-02-09T07:42:08.000Z

Hi @batu,

Thanks for your questions!

Often in RL a graph of agent returns vs. timesteps is provided because it is a rich source of information about agent performance. Using this graph we can see how fast performance improved or how stable returns were, as well as the highest and final returns achieved.

The additional metrics such as strength, stability, sample_efficiency, and training_efficiency are intended to represent the returns graph as a set of quantitative metrics that can be compared across agents.

strength measures the performance of an agent compared to a random baseline and is defined as follows:
$str = \frac{1}{N} \sum_{i=0}^N \overline{R}_i - \overline{R}_{rand}$
where i is the index of a checkpoint, $R_i$ is the return at that checkpoint and $R_{rand}$ is the average return of random agent.

If strength > 0 we know the agent is learning something. Three variants of the strength metric are reported, which give an indication of the shape of the returns curve when used together

strength: average strength over all checkpoints
max_strength: maximum strength out of all checkpoints.
final_strength: strength at the last checkpoint

stability measures the average ratio of strength lost between each checkpoint. If returns are constant or monotonically increasing from checkpoint to checkpoint then an agent is considered perfectly stable and will have a stability score of 1. Conversely if strength has high variance, oscillating between low and high strength, an agent will have low stability, closer to 0. It is defined as follows:
$stability = 1 - \left| \frac{\sum_{i=0}^{N-1} \min(str_{i 1} - str_i, 0)}{\sum_{i=0}^{N-1} str_i} \right|$

sample_efficiency and training_efficiency are two ways of measuring how fast an agent achieved its current strength. It can be used to differentiate between two agents with the same strength at a given checkpoint.

It is defined as a weighted average of strength, with earlier steps weighted more highly than later steps.
$efficiency = \frac{\sum_{i=0}^N \frac{1}{t_i} str_i}{\sum_{i=0}^N \frac{1}{t_i}}$

Time steps t_i are measured in two ways.

Training frames, which yields the sample_efficiency metric:

$sample\_efficiency = \frac{\sum_{i=0}^N \frac{1}{frame_i} str_i}{\sum_{i=0}^N \frac{1}{frame_i}}$

Optimization frames, which yields the training_efficiency metric

$training\_efficiency = \frac{\sum_{i=0}^N \frac{1}{optstep_i} str_i}{\sum_{i=0}^N \frac{1}{optstep_i}}$

We added some information on these metrics to our docs here. They aren't in the book because they are an experimental feature and still under development.

Finally, thanks for offering to take a look at the logging formatting. The code is here.

One thing to be aware of - the single line dump is used so that reporting is clear when there are multiple Sessions running in parallel. If logging was multi-line then data from multiple sessions would likely end up mixed together and be hard to read. If you are interested in implementing multi-line logging, I suggest you look at it for a single Session. We're also open to suggestions for how to do this is in the multi-session case.

Thanks!

Answer 2 · 2020-04-14T17:10:56.000Z

Closing as resolved