godatadriven/evol

Keeping track of the fittest individual

rogiervandergeer opened this issue · 9 comments

In addition to logging (as discussed in #15) and checkpointing (as discussed in #33) I think we need to keep track of the best-performing individual. I think we need to do this separately from logging for two reasons:

  • the best individual is something you always want to find, irregardless of whether you want to log,
  • if logging is going to happen at an interval other than once every iteration, there is a chance that you do not log the best performing individual when it mutates before the logging moment.

We could implement this by storing a single individual inside the Population. I would suggest to make this part of the evaluate call, and always (whether the call is lazy or not) replace the current best by an individual when it is evaluated and its fitness score is better than the current best.

Naturally the result of this is nonsense for the ContestPopulation, as there the fitness depends on the rest of the population too. In this case it would only make sense to store the entire population together with the fittest individual, otherwise it would be impossible to recompute the result. Of course we cannot store populations inside populations, and this would be a typical case to be solved by logging.

Even in a normal Population, for stochastic evaluation functions the 'best individual' may of course be a lucky shot; but I think this is not a problem for us to solve.

Technically speaking, to me it feels like we want to store the best individual in a variable named something like best or historical_best. Currently we have min_individual and max_individual - and I don't recall why we didn't implement a current_best (or best). @koaning do you remember the reason?

The checkpointing will probably also be something you do not do every iteration and will be something that you usually set manually. This means that even with checkpointing, you will always be able to loose your best candidate.

A best candidate is even more tricky though because we usually do not know which candidate was best until we run .evaluate(). It is possible that we generate a child without evaluation, mutate it [and make it worse by accident] and never keep track of the true best candidate simply because we did not evaluate it. Technically this means that evaluate does not only exist to keep things lazy, but it will also exist in order to even keep track of the best candidate.

I think we added min_individual and max_individual such that a user does not need to specify maximize=True. Can't think of a good reason to keep both though.

Ok. So we get rid of min_individual and max_individual.

Of course we can't keep track of individuals we haven't evaluated. I don't think evaluating everything is an option (if you want that you should try a grid search). So is keeping track of the best individual every time evaluate is called the right thing to do?

Do you have preference for naming of the properties / variables?

Current best: best vs current_best
Best ever: historical_best vs best vs best_ever

Or any other name?

it may be sensible to be explicit. best < current_best < generation_best

for best ever ... maybe historical_best? it feels like thesaurus needs to be consulted ... ... documented_best?

I would say that current_best > generation_best, as we've never properly defined a generation. For the best ever, documented_best feels like there are also undocumented individuals... which is true come to think about it. Do you prefer documented_best over historical_best?

I guess it is rather easy to implement these; but I think we should wait for #37 to be merged, as that changes the base on what we need to implement this change.

The reason why I like documented_best has to do with the fact that we don't document all individuals and their scores. I am perfectly fine with current_best as long as we are more explicit than just best.

Okay, we'll go with documented_best and current_best.

Just to confirm the behavior. These best individuals, do we determine/update them during the .evaluate() step?

See #44, where I've implemented the current_best as a property which simply finds the best individual (which is evaluated). The documented_best is updated each time evaluate is called.

Why can't you update the current_best as well?