- Rescore Solr scoring functions with clojure functions
- Connect to nRepl for fast development cycle
- Use the whole Clojure ecosystem while rescoring
- Build Solr plugins without repacking jars and restarting Solr all the time
You can write clojure functions to rescore the Solr results.
It rescores only the n-top results in the query, the top-parameter defines how many to rescore
It should be pretty fast to start going
- Checkout this project and run: lein uberjar
- Copy the uberjar in the target dir into solr classpath
- If you put the jar in solr/lib add startup="lazy" to you requesthandler
- I normally put the jar in the core-dir/lib (you will have to create this dir)
- If one use the solr-config below you will be up and running default rescorer, which rescores random
- Create your own leiningen project and add to solr classpath, this should contain the new rescore function or just keep working on this leiningen project
- Update the solr config with require and config to reach the new rescore function
If you think creating a leiningen project is overkill, you can also do use the "load-file" parameter which should point to an absolute file path.
The plugin will the a load-file on this file at startup.
<searchComponent name="cselect" class="clojureranker.Rescorer">
<lst name="defaults">
<bool name="start-nrepl">true</bool>
<str name="searchComponentName">cselect</str>
<str name="require">clojureranker.test</str>
<str name="function">clojureranker.test/rescore</str>
<int name="top">30</int>
</lst>
</searchComponent>
Then add this lines to your request handler to activate the component:
<arr name="last-components">
<str>cselect</str>
</arr>
Note:
- You need to repeat the searchcomponent-name in the defaults config (like above)
- Start repl with the start-nrepl-param. Only one repl will be started pr. solr instance
- You can have different search-components if you need different rescore-functions on different cores
Example on the look of a rescore function:
(defn rescore [score_list]
"this is only a test rescore function"
(map (fn [doc]
(let [old-score (first doc)
lucene-id (second doc)
solr-doc (nth doc 2)
new-score (if (= (.get solr-doc "id") "055357342X") 1 (rand))
]
[new-score lucene-id])
) score_list)
)
The input to the rescore function is a list of lists like this
[[score lucene-id solr-doc] [score lucene-id solr-doc] [score lucene-id solr-doc] ...]
The return of the function must be a list of type
[[new-score lucene-id] [new-score lucene-id] [new-score lucene-id] ...]
To note:
- Sorting will be handled by the framework, you just provide the new score
- All solr fields are available with the get-function above
- In the example above I just random score all hits, except if the id is 055357342X. Then I score this to 1, so this should always be on the top.
Repl is started at 7888, connect with your favorite editor and recompile and test out on the fly. There is no long restart, packing cycles, but when you require new packages in the project file you will have to rebuid and restart solr.
The repl should off course only be run in debug environments, as it is a loaded gun :)
It is pretty fast and I cant hardly notice the difference between a normal solr query and a rescored one.
But if you do heavy stuff, like getting info through http-requests and/or heavy vector calculations response time will probably rise.
is of course welcome. Just drop create a pull request and drop me a note.
My company, Sannsyn, is working on a plugin called TellusR to do stuff like this in Solr:
- AB-testing directly in Solr
- Boosting, tuning based on ai
- Personalization based on semantic and/or click/purchase info
- Statistics to see how the search is used:
- Most used terms
- Trending stuff
- Which stuff converts best to click/buys
- Find which articles are never shown in hit lists
- Find articles which are shown, but does not convert
- Number of zero-hits, how these trends and which terms these are
- Avg hits pr day, distribution through time and so on
- Response time
- Request times
- We use smart algorithms and anomaly detections to warn you about trouble
- Gui to synonyms, elevation and advanced boosting rules
- More features coming :)
We also adopt the plugin for larger customer if needed.
Parts of this will be open source, stay tuned or if you are interested, just drop us a line to get some early info
This line did cost me my last non-grey hair straw, but it made me available to embed and boostrap the clojure interpreter from Solr:
Thread.currentThread().setContextClassLoader(this.getClass().getClassLoader());
I mention here specifically as I might save some work for someone else.
Drop me a line if you have an alternative approach.
This plugin is compiled against solr 8.4.1-core. Chances are good that it will work out of the box with newer/older versions as well.
But if you would like to be certain, just checkout and change the 8.4.1 in the project-file to your solr version and the run:
lein uberjar
The new jar to add to Solr will be in the target-dir
This plugin is loosely based on info in this article
Thanks for for open sourcing!
Copyright © 2020 Petter Egesund and Sannsyn
This program and the accompanying materials are made available under the terms of the Eclipse Public License 2.0 which is available at http://www.eclipse.org/legal/epl-2.0.
This Source Code may also be made available under the following Secondary Licenses when the conditions for such availability set forth in the Eclipse Public License, v. 2.0 are satisfied: GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version, with the GNU Classpath Exception which is available at https://www.gnu.org/software/classpath/license.html.