rluiten/elm-text-search

Retrieving documents based on search results

Closed this issue · 2 comments

Is there a recommended method to retrieve the actual documents after you get the results? I always struggled with this with lunr.js as well...to me, when I do a search, the IDs aren't easily mapped to the documents they represent since the original documents are in a list rather than a Dict.

Obviously you can do a map over the results then filter the original list to pull out each document, but this seems inefficient.

Great port btw! Working great, just though I'd post up with this question!

The solution is meant to be independent of your document storage API.

In cases where you have a lot of documents you would not even have the documents in your local browser and would have to fetch them from your web service once you have the matches. That is why you can store and retrieve the index, so you don't load all your documents to then create an index to enable search.

However there are good use cases where you would drive elm-text-search from a list of documents.

In cases where you hold the documents anyway I would create a dictionary for the documents so that lookup is simpler. Id do this once at the same time you add them to the index or in lazy fashion once you get the first search results and need the dictionary. I would probably create the index from my list of documents then create the dictionary from the list as long as you don't have some other process dependent on the list discard the list as you will then hold your documents in your dictionary.

Isolated example of creating the dictionary. My excuse for this example is I have not touched Elm in a few weeks so mainly did this to use Elm for a bit.

import Dict exposing (Dict)


docs = 
  [ { id = "a1", title = "title words 1", body = "body words 1" }
  , { id = "a2", title = "title words 2", body = "body words 2" }
  ]


docsDict =
  List.foldr
    ( \({id} as doc) dict ->
      Dict.insert id doc dict
    )
    Dict.empty
    docs


_ = Debug.log "docsDict" docsDict

I am considering if its worth extending elm-text-search to include keeping the documents and then enhancing the result of search to give you the document back as well.

Some thoughts

  • The documents content would not be stored or retrieved with an index.
    • I could only store and retrieve the set document fields you register as ref and the list of indexed fields it would not know about any fields you choose not to index so I could not return the full document unless you index all fields, and for any non trivial document model this is not a good idea.
  • A better thought create a class ElmDocumentSearch that would wrap ElmTextSearch and manage that dictionary turning the results of searching into a list of documents for you. This could then be treated as a Document Storage model rather than a full text index.

I am happy to get feedback on these thoughts.

I am glad you like the library.

Hey, sorry for the late reply, been busy! Thanks for responding and being so detailed. I ended up going with the Dict approach and it worked out very well :).

Since I found that was the case, I think your second option might be valuable for folks as a very lightweight wrapper. Although after reading your post it was easy to implement myself, it seems to me that this is the use case for a large majority of the cases where you'd want to use text search. For large numbers of records, if I have to go to the backend, I'd usually prefer using a full fledged solution like Solr.

Leaving the document store up for debate still certainly makes sense, because maybe I'm wrong about Solr, but I see no reason a nice light wrapper wouldn't be helpful!

Thanks a lot for doing this port and being so helpful, great library!