Generating search result phrases from match locations
markwriter opened this issue · 3 comments
I've got the following code working and I'm able to display "matching phrases" in a popup that appears over any given search result. It is a distinct list of terms and phrases found. Searching for "toy store" in a topic that includes the full search term plus separate instances of both words would result in the list "toy store, toy, store". I'm using stemming, so it can also look like "toy store, toys, toy, store, storage". This is fine w/me. It allows the user to know at a glance what terms will be in a topic before clicking and navigating to that topic.
I just wanted to double check that I'm using Lifti properly. If you could just skim this at your convenience, Hopefully this only takes a minute or two to confirm that I'm using the library correctly - I don't intend to make you feel like you're doing a full code review but just confirming right track/wrong track (though any comments are welcome).
First is a summary of my logic and after that copy in code just for reference.
- Get searchResults from Lifti
- For each search result, look up its business object by Id.
- For each indexed field on the business object ("TopicName" and "Content"), find the locations of each match in the field text.
- If the token indexes of any match locations are sequential, infer that this is a phrase and add the phrase to a list by taking a substring from the first position of the first item all the way through then end of the last sequential item
- If item is not sequential then just add it to the list.
- The distinct set of items will be the matching terms/phrases eventually displayed to the user.
searchResults = _index.Search(searchTerm);
foreach (var searchResult in searchResults)
{
var topic = allVms.FirstOrDefault(t => t.TopicId == searchResult.Key);
if (topic != null)
{
var matchPhrases = new List<string>();
foreach (var match in searchResult.FieldMatches)
{
if (match.FoundIn == "TopicName")
matchPhrases.AddRange(LiftiUtils.MakePhrases(topic.TopicContent.TopicName, match.Locations));
if (match.FoundIn == "Content")
matchPhrases.AddRange(LiftiUtils.MakePhrases(topic.TopicContent.Content, match.Locations));
}
topic.Phrases = matchPhrases.Distinct().ToArray();
topic.Score = searchResult.Score;
}
}
public static List<string> MakePhrases(string text, IReadOnlyList<TokenLocation> matchLocations)
{
List<String> phrases = new List<String>();
if (matchLocations.Count == 0) return phrases;
var runLength = 1;
text = text.ToLower();
for (var i = 1; i <= matchLocations.Count; i++)
{
if (i == matchLocations.Count || matchLocations[i].TokenIndex - matchLocations[i - 1].TokenIndex != 1)
{
var end = matchLocations[i - 1].Start + matchLocations[i - 1].Length -
matchLocations[i - runLength].Start;
var substring = text.Substring(matchLocations[i - runLength].Start, end);
phrases.Add(substring);
runLength = 1;
}
else
{
runLength++;
}
}
return phrases.GroupBy(x => x, StringComparer.InvariantCultureIgnoreCase)
.Select(g => new {value = g.Key, count = g.Count()}).OrderByDescending(x => x.count)
.Select(f => $"{f.value}").ToList();
}
Thanks @markwriter. I've taken the liberty of changing the title of this issue because I think it captures the essence of what's being discussed here.
At a cursory glance, it all looks fine - I don't think there's a particularly right or wrong way to do this sort of thing - as things stand LIFTI gives you the search results and it's then up to you to make of it what you will. If it's doing what you want, you can safely assume it's the right thing for you!
That said, the MakePhrases
method does look like it would make quite a nice extension method on IReadOnlyList<TokenLocation>
- I can see something like it being rolled back into the core library so it's available out-of-the-box.
Thank you - this is what I was looking for.
Oops, closing this now.