/dicio-evaluation

A testing workbench to choose an algorithm to match sentences for the Dicio assistant

Primary LanguageKotlinGNU General Public License v3.0GPL-3.0

Dicio evaluation

This repository contains code and data to evaluate various algorithms that compare a user sentence to a set of reference sentences, giving a score to each comparison in order to find the best match. The purpose of this repository is to choose one such algorithm and (incrementally) tune the parameters for use in the Dicio assistant.

Definitions

User sentence

A user sentence is a sentence that a user may provide as input to an assistant, e.g. "What's the weather like?".

Reference sentence

A reference sentence is a sentence whose "meaning" is known, and can therefore be used to compare against the user sentence. Actually, a reference sentence is not really just a sentence, but is actually a Component that takes care of matching.

Component

A component is the basic block of reference sentences. Its purpose is to hold some data (e.g. a reference word), and compare this data to a substring of the user sentence, providing one or more MatchResults that can be used for scoring. Components are also supposed to extract data the user sentence (e.g. parse "one minute thirty seconds" into a duration), but that's not implemented in this repo.

The components available in this repo are only the most basic ones, namely:

  • a WordComponent which just matches a word
  • a CapturingComponent which can match any number of words or characters
  • a CompositeComponent which takes care of matching multiple other components one after the other

MatchResult

A MatchResult is made of 4 parameters:

  • userMatched: approximately counts the number of words in the user sentence that were matched correctly. Should roughly be increased by the same amount userWeight is increased by, every time a word from the user sentence is consumed and matches.
  • userWeight: approximately counts the total number of words in the user sentence. Should roughly be increased by 1 every time a word from the user sentence is consumed.
  • refMatched: approximately counts the number of words in the reference sentence that were matched correctly. Should roughly be increased by the same amount refWeight is increased by, every time a word from the reference sentence is consumed and matches.
  • refWeight: approximately counts the total number of words in the reference sentence. Should roughly be increased by 1 every time a word from the reference sentence is consumed.

The terms "approximately" and "rougly" were used a lot here, because:

  • characters can be matched too, and not only words, although a smaller user weight is assigned to them
  • some important words may be assigned a user weight > 1, while less important words (e.g. "a", "the") may be assigned a user weight < 1
  • some Components are not really words, e.g. the CapturingComponent, but still have a reference weight

The algorithm

The main part of the algorithm is implemented in CompositeComponent, because it's that component's job to find the best way to match the components it manages, to the user sentence. The basic algorithm is quite dumb, as it just tries to match each sub-component to every possible substring of the user sentence. All of the possible ways to match each sub-component are combined in a list of MatchResults by summing related fields in sub-MatchResults, and userWeight is further increased in case of skipped characters. A layer of memoization and caching is applied to the algorithm to make it more efficient; moreover, MatchResults have the end and canGrow fields that help avoid recalculations. In any case, the complexity of the algorithm is obviously quite bad, since the algorithm generates (u + 1) * binom(u + r, r) possible matches, where u is the number of characters in the user sentence, while r is the number of sub-components (assuming each sub-component only generates one MatchResult).

Strategies

The algorithm's purpose is just to try explore all possible ways to arrange components and generate a list of MatchResults. However, there still isn't anything that calculates a score! This is where Strategyies come into play. A strategy is made of a scoring function and a pruning function.

Scoring function

A scoring function takes a MatchResult as input and produces a scalar Float score as output. Higher scores mean "better match", lower scores mean "worse match". Scores will be used when comparing MatchResults, and in particular when choosing the best MatchResult returned by the algorithm.

Pruning function

A pruning function is used to make the algorithm more efficient, at the cost of possibly preventing the optimal solution from being discovered. Optimal solution means "MatchResult with the highest score". The pruning function is greedy, since it tries to discard intermediate solutions which it believes are not going to contribute to the final optimal solution.

When pruning is safe (linear scoring functions)

If the scoring function $f: matchresult \to score$ respects the following property, the pruning function will be able to make the optimal choice at each step, making the algorithm fast without sacrificing the optimal solution.

$$f(m_1) > f(m_2) \implies f(m_3 + m_1) > f(m_3 + m_2) \space\space \forall m_1, m_2, m_3$$

(the sum between two MatchResults is defined as the pairwise sum between corresponding fields)

This property is always true for linear functions, and more generally in functions of the form $g(f(m))$ where $f$ is linear with respect to MatchResult fields, and $g: R \to R$ is monotone. I don't know if there are other functions that respect this property, but all the non-linear scoring functions I came up with while testing don't respect it.