rluiten/elm-text-search

Leaving field empty when using listFields returns results with score equal to NaN

Closed this issue · 5 comments

I am indexing documents with this type:

type alias RecipeName = String 

Type alias RecipeBody = String 

type alias FeuilleLiaison =
    { date : Date
    , filename : String
    , wholeBody : String
    , recipes : Dict RecipeName RecipeBody
    }

I am using two indexes, one for whole text search, one for the recipes only.
As each document can contain zero, one or more recipes I thought of setting up the index like this:

recipeConfig =
    { indexType = "ElmTextSearch - Customized French Stop Words"
    , ref = Date.toIsoString << .date
    , fields =
        []
    , listFields = [ ( Dict.keys << .recipes, 5.0 ), ( Dict.values << .recipes, 1.0 ) ]
    , initialTransformFactories = Index.Defaults.defaultInitialTransformFactories
    , transformFactories = [ (\func index -> ( index, func )) (FrenchStemmer.stemmer True) ]
    , filterFactories = [ createFilterFunc frenchStopWords ]
    }

The search is working fine and both recipes names and bodies seem to be indexed. However the scores associated with the results are all equal to NaN, so I cannot sort or filter the results.

Putting ( always "", 1 ) in fields seems to remove the NaNs but I do not know if it affect the search in any way.

Am I using the listFields parameter wrong?

I realize now I would be better off with a separate Recipe type referencing FeuillesLiaison. In this case I won't be using the listFields parameter.

My first thought of hearing about Scores of NaN is that there is a bug.
It might be a bug in that we have a config that doesn't behave well and we could improve configuration validation.
Or it might a bug in the engine.

I am trying to think if I have test for no fields and only listFields defined.
It will be at least a week before I can look into this further.

Creating a Pull Request with a new test case for elm-txt-search using your setup and a minimal set of data might help figure things out, and would let me get a better look at your exact setup and the results.

It is quite possible that the test would not even require the french stop word list or the french stemmer, if its in elm-text-search.

I fixed the issue on my current project by reorganizing the data differently, turned out I was not needing listFields after all. So no hurry.
I will try to do a PR with a test case demonstrating the problem, and another one with an example of using the French stemmer and word list. Probably won't have time before this weekend though.

Good to hear resolved problem for now.

I will be happy to get a failing test PR, it might help me figure out what is happening.
I will also be happy to get a small example that shows it running using your french stemmer.

I probably won't be able to look at the failing example in less than a week anyway, the coming weekend for me is busy.

This is fixed in 5.0.2 release and 5.1.0.
I made a mistake in v5.0.2 as I didn't expose new functions correctly for another feature so released v5.1.0 quickly to fix.