MDCHAMP/FreeLunch

Better minimum perfomance testing

MDCHAMP opened this issue · 35 comments

Currently the minimum performance testing is on Exponential and Ackley functions from 1-4 dimensions with (pretty arbitrary) telerances on the objective functions.

Running all these tests is also slowing down the CI by quite a bit.

While I think that guarenteeing minimum performances for each optimiser (with default hyperparemters) is a good idea I think a slightly more principled approach is worth investing some time into.

Oviously there is no free lunch so its not possible to enusre optimal performance on different optimisation scenarios. Even so I think that given some fairly reasonable objective functions and tolerances it should be possible to enusre minimum performance.

Optimisation scenarios

  • Convex

Can use the Exponential function in 1, 2, 5 dimensions. Can use a 0.1% tolerance on the minima given default bounds.

  • Multi-modal

Can use the Ackley function in 1, 2, 5 dimensions. Can determine fitness tollerences based on the height of the 2nd deepest minima (ensure within the golobal minima rather than specific fitness target).

  • High-dimensional

Not clear on the best approach for this, exponential is already quite easy. Might leave this out for now.

You could integrate into humpday's Elo ratings for optimizers (or I could get to it).
https://github.com/microprediction/humpday

Terribly sorry for the ahem delay in getting back to you ... but that actually sounds great! what kind of API integration would be required to make this happen? I can't imagine that these implementations will fare well against some scipy or optuna ones (extreme computational efficiency is not really our aim) but might be cool to see how they do on a per function evaluation basis.

In the short/med term I think that it is still usefull to test minimum performance on commits and so on (mostly so that I don't accidentally break something). @TimothyRogers what are your thoughts?

Integrating with humpday ELO sounds like a very good solution for benchmarking. Although as you say the aim of freelunch was simplicity rather than computational performance so I also feel it will fare poorly.

@MDCHAMP this raises an important issue we have been discussing which is separating the unit tests from the performance benchmarking. I think a big overhaul of the tests is needed to check minimum functionality separately from the optimisation performance then integrate with humpday for ranking.

Hi. Humpday's benchmarking is limited to continuous domains. With that limitation mind, it is a matter of creating something akin to dlib_cube which expects an objective function, maximum number of trials, and dimension and returns the best objective value, best value, and the number of actual function evaluations. It assumes the domain is the hypercube.

    `  def dlib_cube(objective ,n_trials, n_dim, with_count):
       
             ...

            (best_val, best_x, feval_count) if with_count else (best_val, best_x)
    `

If you get me close I can do the rest. I cross-ref'd microprediction/humpday#11

Ok so I think that with our current API, we are already very close to the required form.

To be clear, with_count here is the number of independent runs? (and not a hard limit on f_evals?)

If that is the case then the required form will be something like (for the example of DE)

import freelunch

runs = freelunch.DE(objective, [[-1, 1]]*n_dim)(nruns=withcount, full_output=True) # instance and run

best_val = runs['scores'][0] # all obj scores are sorted low to high
best_x = runs['solutions'][0] # corresponding inputs
feval_count = runs['nfe'] # function evaluations

I guess that you guys will want the compatibility layer to be on your end and I think that something like the above should get you there. Please let us know if you need any more details!

Too easy! Thanks, I'll get it in.

Just double checking that with_count is the number of runs (and not a limited budget of f_evals)?

If it is the latter then some slight tuning of the hyperparameters will be required, for example;

DE = freelunch.DE(objective, [[-1, 1]]*n_dim)

DE.hypers['N'] = n_pop # population size
DE.hypers['G'] = n_gens # number of generations 

n_pop * (n_gens + 1) = feval_budget # recover the f_eval budget

It's just a flag set to True. The n_trials argument is supposed to set an approximate limit on the number of evaluations, though it doesn't have to be a hard limit.

Ah ok, in that case I suggest something like the above to distribute f_evals between the number of generations and the population size.

Yes, I figured. Thanks. The Elo system allows for only approximate compliance to n_trials, as it seems rarely to be the case that optimizers implement a hard constraint. Maybe I'll create a couple of versions with different pop/generation ratios.

That sounds like a good solution, will you do the same for the several other optimisers we have implemented here?

With absolutely no justification... my hunch is that smaller populations for more generations should be better and also will show the difference between the optimisers better. The limit being that for a budget of 150, if you set pop=150, gen=1 that's basically random search. Again with nothing to suggest why, I would generally go about 1:10 from population size to generations. @MDCHAMP any insight?

Sure happy to. All similar to DE I assume?

With absolutely no justification... my hunch is that smaller populations for more generations should be better and also will show the difference between the optimisers better. The limit being that for a budget of 150, if you set pop=150, gen=1 that's basically random search. Again with nothing to suggest why, I would generally go about 1:10 from population size to generations. @MDCHAMP any insight?

Well I mean famously there is no such thing as.... 😉

My go-to heuristic for this sort of thing is to increase the (relative size of the) population depending on how multimodal I believe the objective function to be, for exactly the same reasons as you go into. That said, in the domain of optimisation benchmarks there are both very multimodal and very convex problems so I think a wide range is probably a good idea.

@microprediction the API is the same for all optimisers yes.

@microprediction the latest merge should resolve all the issues with builds. I can notify once there is a new release? or do you guys build the packages from source?

It's going pretty well. Climbing higher and sa_21 seems to be best so far for my portfolio problems.

Looks great! is this on v0.0.11? or more recent? Some bugfixes over the last few days should fix a few things for some optimisers on some platforms. We are also hoping to add 2-3 more optimisers this week.

I've just had a quick look at the implementation in humpday to pull in the optimisers and I am wondering if the example code below might be a bit more neat. I think it retains the same functionality but I haven't extensively tested.

Am I right in thinking that humpday uses the __name__ attribute to track the different optimisers? If so I have dynamically built this from the list of optimisers pulled from freelunch. If we keep following the same pattern with respect to the hyperparameters (hypers['N'] and hypers['G] effectively controlling the number of fevals) this could allow new algorithms to be pulled in without needed new code in humpday.

import math
import inspect
import itertools

try:
    import freelunch
    using_freelunch = True
except ImportError:
    using_freelunch = False

if using_freelunch:

    class freelunch_optimiser():
        
        def __init__(self, method, n_pop=None):
            self.n_pop = n_pop
            self.__name__ = 'freelunch_'+method[0].lower()+'_'+str(n_pop)+'_cube'
            self.optimizer = method[1]

        def __call__(self, objective, n_trials, n_dim, with_count):
            global feval_count
            feval_count = 0

            def _objective(x) -> float:
                global feval_count
                feval_count += 1
                return objective(x)

            optimizer = self.optimizer(obj=_objective, bounds=[[0, 1]] * n_dim)

            if self.optimizer.__name__.lower() == 'sa':
                n_gens = int(math.ceil(n_trials / self.n_pop))
                optimizer.hypers['N'] = self.n_pop  # population size
                optimizer.hypers['K'] = n_gens  # number of generations
            else:
                n_gens = int(math.ceil(n_trials / self.n_pop))
                optimizer.hypers['N'] = self.n_pop   # population size
                optimizer.hypers['G'] = n_gens  # number of generations

            runs = optimizer(nruns=1, full_output=True)  # instance and run
            best_val = runs['scores'][0]  # all obj scores are sorted low to high
            best_x = runs['solutions'][0]  # corresponding inputs
            #feval_count_comparison = runs['nfe']  # function evaluations UNUSED

            return (best_val, best_x, feval_count) if with_count else (best_val, best_x)

    optimisers = [m for m in inspect.getmembers(freelunch, inspect.isclass)\
         if issubclass(m[1],freelunch.continuous_space_optimiser) and\
              m[0] != 'continuous_space_optimiser']
    
    pop_sizes = [3,8,21]
    FREELUNCH_OPTIMIZERS = [freelunch_optimiser(m,p) for m, p in itertools.product(optimisers, pop_sizes)]

else:
    FREELUNCH_OPTIMIZERS = []


if __name__ == '__main__':
    from humpday.objectives.classic import CLASSIC_OBJECTIVES

    for objective in CLASSIC_OBJECTIVES:
        print(' ')
        print(objective.__name__)
        for optimizer in FREELUNCH_OPTIMIZERS:
            print(optimizer.__name__)
            print(optimizer(objective, n_trials=50, n_dim=3, with_count=True))

Looks pretty good to me.

Side question. Do you have any algos implemented that respect constraints? In particular, I'm looking to optimize portfolio weights w subject to the constraint sum( |w_i| )=1

The easier case on the simplex is also interesting.

Or I suppose I could write one and include it in freelunch. Seems like DE slightly modified might do the trick.

This is actually really interesting, Tim and I have been in conversation recently about better ways to handle bounding strategies and I think exposing a class attribute from the base optimiser class might be a good solution.

Practically this would involve setting some method like DE.bounder = my_bounding_strategy
where my_bounding_strategy(population) -> population is a callable that could implement some (arbitrary) bounding strategy that could handle constraints as well.

I don't think that this would take too much modification at this point... thus far every algo just calls tech.sticky_bounds.

What do you think @TimothyRogers ?

Would an instance of

  my_bounding_strategy(population) -> population

comprise a map from R^n to the simplex? (or perhaps w -> [ wi/sum(np.abs(w)) for wi in w] )

I definitely think that introducing the boundary functions as a property of the base optimiser is a big improvement, it would make it very simple to change to (e.g.) elastic bounds or whatever you wanted. So this should be a relatively high priority feature see #28

Constraints is a tricky one and maybe a slightly different problem but it could be handled in a similar way or within the bounding as Max says. My worry is that the nature of these unconstrained algorithms will mean that they regularly violate and much of the optimisation effort is pushed onto how do you move from the unconstrained space to the constrained space. If we are not careful we may end up writing a very convoluted random search inside the constrained space!

Would an instance of

  my_bounding_strategy(population) -> population

comprise a map from R^n to the simplex? (or perhaps w -> [ wi/sum(np.abs(w)) for wi in w] )

There are a number of ways that this could be implemented but I think it would inevitably involve a little bit of manipulation of the freelunch internals.

At present, the bounding functions all look something like

def apply_bounds(sol.dna,  optimiser.bounds):
    for i, chromosone, bound in zip(range(D), sol.dna, optimiser.bounds):

        # check each population member for bounds violation
        if chromesone > bound: 
            
            # logic to return the solution to the bounded region
            sol.dna[i] = bounded_parameter
   
 return sol.dna

It would be simple to modify something like the above to include basic constraint handling. ie

if norm:=np.linalg.norm(sol.dna) < r:  sol.dna = sol.dna * (r/norm)-jitter

for the unit circle.

for ease of use it would be possible to write a wrapper for simple maps like you say but you might not want regions in R^n that are also within the simplex to be perturbed by such a map.

For the example of R^n and S^n there are homeomorphisms that preserve topology... (locally at least) and so the map could just be folded into the objective function. This definitely doesn't hold for all constrained problems though..

Oh, I would love a good map from the cube to the probability simplex. I don't suppose you have one lying around that I can stick in https://github.com/microprediction/embarrassingly ? I agree that is simpler if we can modify the objective function rather than the optimizers.

It seems I have slightly misread what it is that you would like to achieve. For L2 norm bounding something like my previous comment is suitable.

For an orthogonal projection from R_n down onto the probability simplex (topologically a (n-1)-ball) there is a quadratic programming problem for which I have only been able to find algorithmic solutions...

For example: https://eng.ucmerced.edu/people/wwang5/papers/SimplexProj.pdf contains a neat O(n log n) solution which would be very easy to put inside an objective function.

Hopefully that helps!

@TimothyRogers The above could come in handy for the F1 financial stage no?

Thanks. Though this map is from R^D to D-1 dimensional simplex. What I'm really looking for is a map from any bounded domain (supported by most optimizers) to the probability simplex of the same dimension, so as to be able to better trick the optimizer into doing a good job. Projection will leave it wandering in a pointless co-dimension 1, which is what I sometimes do.

ah ok, so this is an interesting problem for sure (you would not believe how many people in our dept. got distracted by this yesterday...). I think I have a nice solution for a latent R^(n-1)+ onto the unit simplex in n-1 dimensions by a nonlinear scaling the L1 norm of a latent R^(n-1) vector and then adding a constrained extra dimension to get to the probability simplex. Its not exactly a bounded search though.. perhaps there is some clever L1 scaling to go from the unit cube to the unit simplex? Or an orthogonal projection trick @TimothyRogers?

def _to_simplex(latent_vector):
    L1 = np.tanh(sum(a *latent_vector)) # nonlinear scaling factor
    r_n_simplex_solution = np.zeros((latent_vector.shape[0]+1,)) 
    r_n_simplex_solution[:-1] = latent_vector * (L1/sum(latent_vector)) 
    r_n_simplex_solution[-1] = 1  - sum(r_n_simplex_solution[:-1]) # extra dimension satisfies the prob. simplex
    return r_n_simplex_solution

where a is a hyperparameter of the map.

plotting this in 2D with a=1 clearly shows the missed regions..

image

Increasing to a=2 gives better coverage but with more bias at the simplex edge..

image

This really is one of those problems that is hard to put down...

ok... final one of these...

def cube_to_simplex_embedding(latent_point_in_cube, a=3, b=3):
    N = latent_point_in_cube.shape[0]
    L1 = np.norm(latent_point_in_cube, 1)
    L2 = np.norm(latent_point_in_cube, 2)
    scale = np.tanh(a*L2^b)/L1

    point_on_prob_simplex = np.empty((N+1, ))
    point_on_prob_simplex [:-1] = latent_point_in_cube * scale
    point_on_prob_simplex [-1]  = 1 - sum(point_on_prob_simplex [:-1])
    return  point_on_prob_simplex 

with a=3 and b=3;

image

The mapping is nonuniform but locally continuous, and so should be a good fit for the bounded -> constrained problem

This is seriously cool. I'm going to set up a few benchmark objective functions on the simplex.

So after another day of playing I have a really nice nonparametric, continuous (almost uniform) map..

def cube_to_simplex_embedding(latent_point_in_cube):
    n = latent_point_in_cube.shape[0]
    L1 = np.norm(latent_point_in_cube, 1)
    L2 = np.norm(latent_point_in_cube, 2)

    scale = (1/sqrt(2*n)) + (L2/(sqrt(2) * L1)) * sqrt(1 - (L1/(sqrt(n) * L2))**2)

    point_on_prob_simplex = np.empty((n+1, ))
    point_on_prob_simplex [:-1] = latent_point_in_cube * scale
    point_on_prob_simplex [-1]  = 1 - sum(point_on_prob_simplex [:-1])
    return  point_on_prob_simplex 

image

The geometric interpretation is (I think) really nice! First we scale the strictly positive part of the unit cube onto the strictly positive part of the unit ball using the cosine of the angle between our vector and the normal vector of the simplex. Then scale like a/b where A is the L2 norm (which is now uniform in scale) and b is the L1 norm (so we are within the simplex). A bit of algebra to get rid of the cosines and arcosines and we have it!

Plus no hyperparameters and I think that it works in any dimension!

Thinking that this can be closed now that the humpday implementation is chugging along.