/rdf-graph-search-with-solr-custom-streaming-expression

SolrCloud RDF store to find entity-relations via solr 6.6.2 custom streaming query

Primary LanguageJava

rdf-graph-search-with-solr-custom-streaming-expression

  • Given a RDF graph indexed in Solr 6.6.2, a custom streaming query paths()was implemented to find connections between any two entities in the RDF graph.
  • paths(from, to) query will return all the paths connecting 'from' and 'to' as per solr index.
  • Returns results in sub-second :)..

1. Demo Time

  1. Let's index some sample data about billgates and microsoft
  2. Once the index is ready, search using paths() query to find all the paths connecting these two entities in this graph.

Let me walk you step-by-step starting from index to query and see the results

1.1 Sample Data

1.2 Sample Query

Find all connections between BillGates and Microsoft

      paths(rdf, 
      from="src_s->billgates",
      to="dst_s->microsoft",
      fl="src_s,dst_s,relation_s")

image

1.3 Sample Query Results

Found 4 different paths connecting billgates and microsoft

2. Architecture and Features:

  • Merges search with parallel computing (paralelly computes query in all shards and merges the results).
  • Fully Streaming.
  • SolrCloud aware.

image

2.1 Query syntax:

paths(<collectionName>, 
      from="fromField->fromNode",
      to="toField->toNode",
      fl="<csv of fields to return per matching document>",
      maxDepth="<maximum depth to go searching for toNode starting from fromNode>")

3. Compile and Run

3.1 Generate jar file

mvn clean package generates jar file.

3.2 Create .system collection

curl 'http://localhost:8983/solr/admin/collections?action=CREATE&name=.system'
curl http://localhost:8983/solr/.system/config -d '{"set-user-property": {"update.autoCreateFields":"false"}}'

3.3 Upload jar file to .system collection

curl -X POST -H 'Content-Type: application/octet-stream' --data-binary @rdf-graph-search-with-solr-custom-streaming-expression-1.0-SNAPSHOT.jar 'http://localhost:8983/solr/.system/blob/test'

3.4 Verify that this jar is uploaded under the name 'test' in .system collection

curl 'http://localhost:8983/solr/.system/blob?omitHeader=true'

3.5 Add our jar as runtime-lib to test collection

curl 'http://localhost:8983/solr/test/config' -H 'Content-type:application json' -d '{   "add-runtimelib": { "name":"test", "version":1 }}'

3.6 Register our paths() custom streaming expression with test collection

curl 'http://localhost:8983/solr/rdf/config' -H 'Content-type:application/json' -d '{
  "create-expressible": {
    "name": "paths",
    "class": "com.solr.custom.streaming.PathsStreamingExpression",
    "runtimeLib": true
  }
}'

Tht's it!! Now, navigate to "stream" tab of test collection in SolrAdmin page and run your query..

4. How to update solr about any changes done to the jar?

4.1 Upload new jar with changes again

curl -X POST -H 'Content-Type: application/octet-stream' --data-binary @custom-streaming-expression-1.0-SNAPSHOT.jar 'http://localhost:8983/solr/.system/blob/test'

4.2 Check the new version number.

Now you should see that the jar version number has increased by 1. Let's say the version is now increased from 1 to 2.

curl 'http://localhost:8983/solr/.system/blob?omitHeader=true'

4.3 Update 'test' collection to use version 2 of our jar

curl 'http://localhost:8983/solr/test/config' -H 'Content-type:application json' -d '{   "update-runtimelib": { "name":"test", "version":2 }}'

No need to restart solr. Your changes are reflected and you can start querying solr to test your results.

4.4. How to remove the custom expression from a collection

curl 'http://localhost:8983/solr/test/config' -H 'Content-type:application/json' -d '{
  "delete-expressible": "mysearch"
}'