dariogoetz/keyboard_layout_optimizer

Filtering out >99.99% of layouts before even testing them

Closed this issue · 10 comments

First the basic idea, then the implementation. Of course, as all other evaluation-modes, this should be optional and configurable.

With a problem space of ~10^35 layouts, we can't solely rely on the fact that we use one of the fastest programming languages there is (and a great project-structure, I've been told). Optimization algorithms are great and help a lot, but even they are bound by the time it takes to test layouts. Therefore the idea:

Idea

Add a mechanism that checks a layout for minimum requirements. Do this using only unigrams. Only use criteria that can be true or false (instead of getting a cost). This check should very fast compared to the bigram & trigram metrics currently in use.

Only if the found layout meets those basic requirements should it be accepted for regular evaluation.

Implementation

It might be tempting to use this method inside the regular evaluation. We might think to stop evaluation if we see that costs become too high, midway during evaluation. I think this is not optimal, mainly because it might mess up algorithms. Stopping evaluation wouldn't absolve us of returning a total_cost(), which we can't precisely, having preemptively stopped evaluation. This will impact any algorithm that requires precise comparisons of layout-costs. (SA, possibly ABC and Genevo).

Instead, I propose a function like this meets_min_requirements, that might be used like this: evaluator.meets_min_requirements(&layout). This function would be placed inside layout_generator.generate_random(), common::perform_n_swaps() and common::switch_n_keys().

The goal is to make those functions check if their generated layout meets our minimum requirements, and if it doesn't, try another swap/switch/random layout.

Basic example:

/// Takes in a Layout, switches [nr_switches] keys in that layout, then returns it.
/// Layout, in this case, is a [Vec<usize>].
pub fn perform_n_swaps(&self, permutation: &[usize], nr_switches: usize, evaluator: Evaluator) -> Vec<usize> {
    get indices-vec
   
    loop {
        1. swap indices from original indices-vec
        2. let meets_min_requirements = evaluator.meets_min_requirements(&swapped_indices)
        3. if meets_min_requirements { break }
    }

    indices
}

Minimum requirements

Of course, those requirements should be carefully and conservatively considered. The ones I propose as a starting-point are the following:
("top keys" refers to the most frequent letters. "top positions" are read from key_costs:)

  key_costs:
    - [80,   70, 60, 50, 50, 50, 60,   60, 50, 50, 50, 60, 70, 80]
    - [24,      16, 10,  5, 12, 17,   20, 13,  5,  9, 11, 20, 36]
    - [ 9,         5,  3,  3,  3,  6,    6,  3,  3,  3,  5,  9, 30,  6]
    - [20,   16,    19, 24, 20,  9, 30,   10,  8, 22, 22, 17, 19]
    - [ 0,  0,  0,              3,               7,  0,  0,  0]

Key Costs

  1. top 2 keys are on the top 16 positions
  2. top 6 keys are on the top 20 positions
  3. bottom 6 keys are NOT on the top 6 positions

This would only leave approximately (16/32)^2 * (20/32)^4 * (16/32)^6 = 0,0006 = 0.06% of layouts

Hand disbalance, Hand switching

  1. top 4 keys are not all on the same hand

This would probably (I'm not 100% sure about the math) leave approximately *(0,5^4)2 = 0,125 = 12% of layouts

Finger loads

  1. Each finger's load is less than twice and more than half the optimal load

This is a good one, I'm not sure how to calculate its advantage though.

Bad positioned shortcut keys

(This is an controversial one!! Remember, it should be configurable!)

  1. All the desired shortcut-keys should be placed on the [x] leftmost columns of the keyboard

This would only leave approximately 0,5^4 = 0,0625 = 6% of layouts.


All of this would come down to 0,0006 * 0,125 * 0,?? * 0,0625 = 0,000004688. I don't want to count how many % of layouts that leaves, but it's helpful to say the least.

EDIT: It's about 1 in 200_000.

Feedback?

There's one small logical caveat that we need to consider when implementing this idea, but that's a question for later. What are your thoughts on the proposal, @dariogoetz?

Caveats:

  1. If about 1 in 200_000 (randomly generated) layouts fits the criteria, we need to test how long the layout_generator.generate_random()-function takes to generate this many layouts.
  2. When starting with a really bad layout (QWERTZ?) it's possible to never arrive at a layout that meets the minimum requirements by modifying the input-layout using single key-swaps. In those cases, the loop would never get broken. Thus, we need to amend the functions common::perform_n_swaps() and common::switch_n_keys() as follows (notice prev_layout_meets_requirements) :
/// Takes in a Layout, switches [nr_switches] keys in that layout, then returns it.
/// Layout, in this case, is a [Vec<usize>].
pub fn perform_n_swaps(&self, permutation: &[usize], nr_switches: usize, evaluator: Evaluator, prev_layout_meets_requirements: bool) -> Vec<usize> {
    get indices-vec

    if prev_layout_meets_requirements {
        loop {
            swap indices from original indices-vec
            let meets_min_requirements = evaluator.meets_min_requirements(&swapped_indices)
            if meets_min_requirements { break; }
        }
    } else {
        swap indices from original indices-vec
    }

    indices
}

Thanks for the detailed suggestions. I like the idea of being able to quickly discard a layout that is "obviously" insufficient.

Keep in mind, though, that most of the computational time for a layout evaluation goes into splitting symbols of a higher layer into the corresponding modifiers. This has to be done for unigram, bigram, and trigram metrics (not layout metrics). In particular, it is required to compute hand balances or finger loads. The speedup would be relatively small (only one order of magnitude or so) in that case.

I see the perform_n_swaps function more as a utility function, that should not contain too much logic. Also, it is not used in all optimization algorithms, so implementing the logic there would not provide the benefit to e.g. the genetic algorithm.

I suggest instead supporting the option to "exit early" out of the evaluation. That the process of generating and then evaluating a layout would remain identical. After the evaluating layout metrics (and before splitting modifiers) there would be an "early exit" condition that could avoid the subsequent expensive steps.

Such a condition could be "layout metric costs exceed a threshold" (and the conditions that you list above would then be implemented as layout metrics that give a cost higher than the early exit threshold) or one could think of giving each metric the option to "vote" for an early exit and at the decision point the process skips the rest as soon as one metric voted positive.

One could also consider having such a early exit decision point after each class of metrics (one would only save relatively little for the later metric classes, though).

Keep in mind, though, that most of the computational time for a layout evaluation goes into splitting symbols of a higher layer into the corresponding modifiers. This has to be done for unigram, bigram, and trigram metrics (not layout metrics). In particular, it is required to compute hand balances or finger loads. The speedup would be relatively small (only one order of magnitude or so) in that case.

I forgot to mention it, but the idea included that only layer-1 symbols (= lower case letters) were to be used, thus no expansion. Without this restriction, this wouldn't make much sense.

I suggest instead supporting the option to "exit early" out of the evaluation. That the process of generating and then evaluating a layout would remain identical. After the evaluating layout metrics (and before splitting modifiers) there would be an "early exit" condition that could avoid the subsequent expensive steps.

Honestly, I don't think this would be optimal either. It would interfere with algorithms that require the precise cost of the layouts that actually are tested to function properly (SA and ABC for sure, no idea about genetic).

  • SA needs the precise cost to accept Layouts with a certain probability. It's no greedy algorithm.
  • ABC, if I understand correctly, distributes resources based on how good certain layouts are
  • (I think genetic allows layouts to have a certain number of "offsprings" according to how good the layout is)

Additionally this would force us to include the higher-layer-splitting-part, which again would make this functionality only slightly useful.

I see the perform_n_swaps function more as a utility function, that should not contain too much logic. Also, it is not used in all optimization algorithms, so implementing the logic there would not provide the benefit to e.g. the genetic algorithm.

My original proposal was based on the thought that the genetic algorithm also used either common::perform_n_swaps() or common::switch_n_keys().
How does the genetic algorithm modify the layouts?

I forgot to mention it, but the idea included that only layer-1 symbols (= lower case letters) were to be used, thus no expansion. Without this restriction, this wouldn't make much sense.

I am not sure how that can work. In particular, if one condition involves finger loads, the expansion is necessary. A large part of the pinky finger's load comes from hitting the shift key, that only comes into play with the modifier expansion. If the corpus contains for instance a bigram aB. Would you discard it? Or only consider ab?

Honestly, I don't think this would be optimal either. It would interfere with algorithms that require the precise cost of the layouts that actually are tested to function properly (SA and ABC for sure, no idea about genetic).

* SA needs the precise cost to accept Layouts with a certain probability. It's no greedy algorithm.

Does it need the precise cost? Or just an indication whether the layout's cost is larger than the previous one? In the latter case, a suitable cost estimate may be given with a corresponding layout metric.

* ABC, if I understand correctly, distributes resources based on how good certain layouts are

A layout that is exited early will be a bad layout (high cost) irrespective of whether it was exited early or not.

* (I think genetic allows layouts to have a certain number of "offsprings" according to how good the layout is)

Same here, if an early exit is performed, the cost will be high.

My original proposal was based on the thought that the genetic algorithm also used either common::perform_n_swaps() or common::switch_n_keys(). How does the genetic algorithm modify the layouts?

In the genetic algorithm case, the mutation is performed by the SwapOrderMutator (https://docs.rs/genevo/latest/genevo/mutation/order/struct.SwapOrderMutator.html).

In general, I would like to avoid enlarging the Evaluator object much (as by adding a new class of "minimum-requirement-metrics") and its interface. I would rather try to use the existing interface to accomplish the desired speedup. If the existing one proves insufficient, I am willing to add complexity, but only if the gain is substantial enough.

That being said, I could see the option of adding an evaluate_layout_metrics method to the Evaluator that only evaluates the layout metrics. If the layout metrics- value exceeds a certaint threshold, one would say that it does not meet the minimum requirements.

I am not sure how that can work. In particular, if one condition involves finger loads, the expansion is necessary. A large part of the pinky finger's load comes from hitting the shift key, that only comes into play with the modifier expansion. If the corpus contains for instance a bigram aB. Would you discard it? Or only consider ab?

  1. I don't see any reason to use bigrams at all in those metrics. Unigrams contain all the information we need, as far as I am aware.
  2. I think you might be right though, if the shifts are important in calculating finger costs, we should split them. That being said, we could easily be crafty and take some shortcuts that speed up this part. For example, we could:
    1. Get (and save) the total amount of non-letter-pinky-usage (shift, M3, M4, …)
    2. Treat all upper-case letters as lower-case ones
    3. Divide the non-letter-pinky-usage evenly across the two pinkies.

(I also think as an additional speed-up we could ignore symbols entirely. They make up too small of a fraction to make a difference in this minimum-requirements-check)

Does it need the precise cost? Or just an indication whether the layout's cost is larger than the previous one? In the latter case, a suitable cost estimate may be given with a corresponding layout metric.

Yes, it needs to be the precise cost. This is an important part of Simmulated Annealing, in general as well as in this specific implementation. From the argmin-codebase:

// Solutions worse than the previous one are accepted with a probability given as:
//
// `1 / (1 + exp((next_cost - prev_cost) / current_temperature))`,
//
// which will always be between 0 and 0.5.
let prob: f64 = self.rng.gen();
let prob = F::from_f64(prob).unwrap();
let accepted = (new_cost < prev_cost)
    || (F::from_f64(1.0).unwrap()
        / (F::from_f64(1.0).unwrap() + ((new_cost - prev_cost) / self.cur_temp).exp())
        > prob);

A layout that is exited early will be a bad layout (high cost) irrespective of whether it was exited early or not.

The problem is that most algorithms require the precise cost. Otherwise all we can do is implement a mix of hillclimbing & random-walk.

Do the genetic algorithm and the abc-algorithm both only use boolean comparisons like let accept = new_cost < old_cost? I'd be really suprised if this was the case.
(I tried to find the part where costs are compared in abc, but I wasn't able to find anything)

1. I don't see any reason to use bigrams at all in those metrics. Unigrams contain all the information we need, as far as I am aware.

The argument is the same for unigrams, e.g. B. Discard? Only use b?

2. I think you might be right though, if the shifts are important in calculating finger costs, we should split them. That being said, we could easily be crafty and take some shortcuts that speed up this part. For example, we could:
   
   1. Get (and save) the total amount of non-letter-pinky-usage (shift, M3, M4, …)
   2. Treat all upper-case letters as lower-case ones
   3. Divide the non-letter-pinky-usage evenly across the two pinkies.

These sound like hacks to me. I would prefer a "clean" solution that does not unnecessarily increase code complexity.

(I also think as an additional speed-up we could ignore symbols entirely. They make up too small of a fraction to make a difference in this minimum-requirements-check)

Do uppercase letters count as symbols?

Does it need the precise cost? Or just an indication whether the layout's cost is larger than the previous one? In the latter case, a suitable cost estimate may be given with a corresponding layout metric.

Yes, it needs to be the precise cost. This is an important part of Simmulated Annealing, in general as well as in this specific implementation. From the argmin-codebase:

// Solutions worse than the previous one are accepted with a probability given as:
//
// `1 / (1 + exp((next_cost - prev_cost) / current_temperature))`,
//
// which will always be between 0 and 0.5.
let prob: f64 = self.rng.gen();
let prob = F::from_f64(prob).unwrap();
let accepted = (new_cost < prev_cost)
    || (F::from_f64(1.0).unwrap()
        / (F::from_f64(1.0).unwrap() + ((new_cost - prev_cost) / self.cur_temp).exp())
        > prob);

A layout that is exited early will be a bad layout (high cost) irrespective of whether it was exited early or not.

The problem is that most algorithms require the precise cost. Otherwise all we can do is implement a mix of hillclimbing & random-walk.

I am not sure, whether the "precise" cost is really necessary. The cases that we are discussing are those, where the optimization is still in its beginning stages and layouts are still mostly "random". When a layout leads to an early exit, this means that it is "not feasible" in the context of your "minimum requirements". In that case, it should be sufficient to know, whether a new solution got better or not. If the costs for early exit layouts is sufficiently larger than for "normal" layouts, they will have a small probability of being chosen. If both the old and the new layouts are "early exit", it does not matter much, which one is chosen (both are so bad to be "infeasible".

Do the genetic algorithm and the abc-algorithm both only use boolean comparisons like let accept = new_cost < old_cost? I'd be really suprised if this was the case. (I tried to find the part where costs are compared in abc, but I wasn't able to find anything)

I don't know, to be honest.

If one implements your ideas as a layout metric and one exits early, one can also give a cost to the "strength of infeasibility", thus allowing a hill climb. If one uses a loop in perform_n_swaps as you suggested, it is possible to never escape it, if the infeasibility condition is strict.

I just realized that splitting unigram modifiers isn't actually that expensive. It's the trigrams that are expensive. The splitting of uni, bi and trigrams is separate.

So having unigram metrics should be fine.

This was not correct.

I'm not sure how to reply to your first points as I'm not sure how to interpret your last message.
In general though, I feel like I already have replied to many concerns. There might be some confusion going on. For example:

The argument is the same for unigrams, e.g. B. Discard? Only use b?

This was answered in point 2, in the plan of collecting shift-frequency and dividing it across pinkies.

If one uses a loop in perform_n_swaps as you suggested, it is possible to never escape it, if the infeasibility condition is strict.

You are right, this is an issue. However, I mentioned this problem (and a feasible solution) in my second comment.

These sound like hacks to me. I would prefer a "clean" solution that does not unnecessarily increase code complexity.

In my opinion, this doesn't sound dirty but rather like an efficient mechanism. That being said, you have better knowledge of the optimizer and its inner workings.
To me personally, (if we used the initial proposition of having a separate min-requirements-struct), this seems doable.

struct MinimumRequirements {
    /// All uppercase-letters are turned into lowercase ones ONCE. Their frequencies get added.
    letters_with_frequencies: FxHashMap_or_other_type,
    /// The frequencies are captured when un-capitalizing uppercase letters.
    shift_frequency: f64
}

impl MinimumRequirements {
    pub fn new() -> Self {}
    pub fn meets_min_requirements(&self, layout: &Layout) -> bool {}
}

I am not sure, whether the "precise" cost is really necessary. The cases that we are discussing are those, where the optimization is still in its beginning stages and layouts are still mostly "random". When a layout leads to an early exit, this means that it is "not feasible" in the context of your "minimum requirements". In that case, it should be sufficient to know, whether a new solution got better or not. If the costs for early exit layouts is sufficiently larger than for "normal" layouts, they will have a small probability of being chosen. If both the old and the new layouts are "early exit", it does not matter much, which one is chosen (both are so bad to be "infeasible".

You may be right. Let me summarize the pros and cons of this "stop evaluation early"-idea, as I understand it, so that we may come to a conclusion:
Pros:

  • Easier to implement
  • No need to think about & wrestle with new rules for splitting unigrams
  • Applicable to all algorithms, including the genetic one
  • Allows inplementing cut-off measures not just in unigram-metrics but also for bigrams & trigrams

Cons:

  • May mess with algorithms that require precise cost comparisons
  • Requires more maintenance & testing (no clear boolean rules; instead, certain costs need to be used as cut-off criteria)
    -> Doesn't allow for some of the very specific rules mentioned in the first post.

(Of course the pros&cons of the initial idea are the inverse.)
Would you agree that this is a fair summary? Anything to add?

If you want to have separate feasibility checks (not as early exit as part of the evaluation), I can see the following way forward:

  1. Add a UnigramMapper trait to ngram_mapper.rs in the layout_evaluation crate that contains a function

     fn mapped_uningrams<'s>(&self, layout: &'s Layout) -> (Vec<(&'s LayerKey, f64)>, f64, f64);

    where the latter f64s are the found and not found weights. This allows mapping only the unigrams, which is much faster than mapping bi- or trigrams and a lot "cleaner" than distributing shifts across both hands. Implement the trait for the OnDemandNgramMapper that we use.

  2. Introduce a separate module feasibility.rs in the layout_optimization_common crate that contains

    • a trait FeasibilityCondition containing a function
     fn check_feasibility(&self, layout: Layout, unigrams: &[(&LayerKey, f64)]) -> bool
    • a struct FeasibilityChecker that holds a Box<dyn UnigramMapper> and a Vec<Box<dyn FeasibilityCondition>> and has a method
    fn check_feasibility(&self, layout: Layout) -> bool

    that generates the mapped unigrams with its unigram_mapper and calls the check_feasibility functions of its feasibility conditions. That method can then be called by the optimizer when trying to generate a new layout.

  3. Introduce a directory feasibility_conditions parallel to feasibility.rs that contains implementations of the various feasibility checks that you have in mind in separate modules.

  4. Add a default_conditions method to the FeasibilityChecker that add all the feasibility conditions to its vec.

This structure mirrors the one used for the layout Evaluator in evaluation.rs. You could find inspiration there.

Regarding the question of where to call the check_feasibility function, I would prefer keeping it outside of perform_n_swaps and rather looping the call to perform_n_swaps. Something like the following inside the optimization.rs in the SA crate:

    /// Modify param (~the layout).
    fn modify(&self, param: &Self::Param, _temp: f64) -> Result<Self::Param, Error> {
        let mut layout = param.to_vec();
        loop {
            // if we are unlucky (and the feasibility check is very strict), we can remain within this loop forever
            layout = self.layout_generator.perform_n_swaps(&layout, self.key_switches);
            if self.feasibility_checker.check_feasibility(layout) {break;}
        }
        Ok(layout)
    }
  1. top 4 keys are not all on the same hand

Mmm I might disagree on that. For example, my Poqtea layout has TEAOI all on right hand, yet hand balance is 52% left.
https://www.keyboard-design.com/letterlayout.html?layout=poqtea-qp.en.ansi

Unless you are counting Space as one of the top 4.

What you are proposing is basically what I did last year ... I generated millions of layouts that met certain criteria similar to your proposals. Now I need to evaluate them ... I did try my own "quick eval" routine but results were not reliably good. Poqtea was one of the better ones, but not the best per the Quick Eval code. Since then I've tried Opt and CarpalX and https://github.com/ChenDanCMU/Keyboard-Layout-Search but none work reliably.