This library allows users to find sets of equal elements through an iterative, manual process.
To illustrate this library's usage, we will use the running example of trying to work through a set of questions we have received from different people and figure out which are the same, to avoid duplicate work answering them.
Example: The questions {"When were you born?", "What's your birthdate?", "What inspired you to become a professional author?", "Could you share your inspiration for becoming a professional author?"}
will undergo an iterative process, arriving at
the sets of equal questions: {"When were you born?", "What's your birthdate?"}
and
{"What inspired you to become a professional author?", "Could you share your inspiration for becoming a professional author?"}
, allowing us to halve the number of questions
we need to answer.
EqualityFilter works by assuming the equivalence relation the user is looking for indeed satisfies the reflexive,
transitive and symmetric properties. Note that the transitive property is not always so clear,
since subtleties in certain domains (the English language in our example) might mean that while
x
is "pretty much the same" as y
, and y
is "pretty much the same" as z
, x
might be considered different
enough from z
that they're not "pretty much the same" anymore.
To start, create a EqualityFilter
object and pass it the array of elements you wish to analyze.
var filter = new EqualityFilter<String>(["When were you born?",
"What's your birthdate?",
"What inspired you to become a professional author?",
"Could you share your inspiration for becoming a professional author?"]);
Then iteratively refine information by specifying either which elements are equal to a given element, or which aren't. The Java Swing example provided in this repository works by iteratively popping a question off our list of questions, then asking the user which other questions (filtered using some simplifications) are equal to and which aren't to this current question.
while (!questions.isEmpty()) {
String currQ = questions.pop(); // Also remove the question from questions
Set<String> possiblyEqualQuestions = filter.getPossiblyEqual(currQ);
if (possiblyEqualQuestions.isEmpty()) continue; // Optimization
// The mentioned manual part
Set<String> equalQuestions = manuallyFindOutEqualQuestions(currQ, possiblyEqualQuestions);
Set<String> notEqualQuestions = manuallyFindOutNotEqualQuestions(currQ, possiblyEqualQuestions);
filter.setEqual(currQ, equalQuestions);
filter.setNotEqual(currQ, notEqualQuestions);
}
return filter.getGroups();
getPossiblyEqual
returns all questions for which we don't know the relation to currQ
yet. For example,
if we already know that currQ = "What's your birthdate?"
is equal to "When were you born?"
, and that
"When were you born?"
is not equal to "What inspired you to become a professional author?"
, then we know that
currQ
can't possibly be equal to "What inspired you to become a professional author?"
, hence we don't need to
manually determine that. In this case, getPossiblyEqual(currQ)
would only return
"Could you share your inspiration for becoming a professional author?"
, provided we don't know yet that the two
inspiration questions are equal. If we knew they were equal, the process would already be finished.
Next, we pass the manually determined information about equal and not equal questions forward to the filter.
Finally, once we've processed all questions, we're done and can return the sets of equal questions using
filter.getGroups()
.
Using the Java Swing application to find the sets from this example, we require only two manual partitions, i.e. the minimal amount of manual action needed (since we end up with two sets of equal elements, we must have at least 2 manual partitions to correctly determine the sets). Note that while one could also manually find the equal sets directly and using the minimal number of partitions, it is easier determining for just one question at a time its equal and not equal sets.