/elasticsearch-mocksolrplugin

Use Solr clients/tools with ElasticSearch

Primary LanguageJava

ElasticSearch Mock Solr Plugin

Use Solr clients/tools with ElasticSearch

This plugin will allow you to use tools that were built to
interact with Solr with ElasticSearch.

The idea for this plugin came when I wanted to use Nutch with
ElasticSearch. Instead of extending Nutch itself,
I thought it would be nice to use any Solr clients with
ElasticSearch. Some projects we can now use are
Nutch, Apache ManifoldCF, and any tool using SolrJ. It
should be possible to use non-java tools that write to
Solr using the XML update and request handlers as well.

Supported Solr features

  • Update handlers
    • XML Update Handler (ie. /update)
    • JavaBin Update Handler (ie. /update/javabin)
  • Search handler (ie. /select)
    • Basic lucene queries using the q paramter
    • start, rows, and fl parameters
    • sorting
    • filter queries (fq parameters)
    • hit highlighting (hl, hl.fl, hl.snippets, hl.fragsize, hl.simple.pre, hl.simple.post)
    • faceting (facet, facet.field, facet.query, facet.sort, facet.limit)
  • XML and JavaBin request and response formats

How do you build this plugin?

Start by specifying the ElasticSearch and solr versions in
the pom.xml. Make sure the solr version matches the version
of lucene used in ElasticSearch.

...
<elasticsearch.version>0.18.6</elasticsearch.version>
<solr.version>3.5.0</solr.version>
...

Use maven to build the package

mvn package

Then install the plugin

# if you've built it locally
$ES_HOME/bin/plugin -url file:./target/releases/elasticsearch-mocksolrplugin-1.1.1.zip -install mocksolrplugin

# if you just want to install the pre-built package from github
$ES_HOME/bin/plugin install mattweber/elasticsearch-mocksolrplugin/1.1.1

How to use this plugin.

Just point your Solr client/tool to your ElasticSearch instance and appending
/_solr to the url.

http://localhost:9200/${index}/${type}/_solr

${index} – the ES index you want to index/search against. Default “solr”.
${type} – the ES type you want to index/search against. Default “docs”.

Example paths:


// Will search/index against index “solr” and type “docs”
http://localhost:9200/_solr

// Will search/index against index “testindex” and type “docs”
http://localhost:9200/testindex/_solr

// Will search/index against index “testindex” and type “testtype”
http://localhost:9200/testindex/testtype/_solr

Use the client/tool as you would with Solr.

Example SolrJ Indexing

    CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:9200/testindex/testtype/_solr");
    server.setRequestWriter(new BinaryRequestWriter());
    // we support both xml and SolrBin response writers
    //server.setParser(new XMLResponseParser());
    
    SolrInputDocument doc1 = new SolrInputDocument();
    doc1.addField( "id", "id1", 1.0f );
    doc1.addField( "name", "doc1", 1.0f );
    doc1.addField( "price", 10 );

    SolrInputDocument doc2 = new SolrInputDocument();
    doc2.addField( "id", "id2", 1.0f );
    doc2.addField( "name", "doc2", 1.0f );
    doc2.addField( "price", 20 );
    
    Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
    docs.add( doc1 );
    docs.add( doc2 );
    
    server.add( docs );
    server.commit();

    // deletes work as well
    //server.deleteById("id2");
    //server.commit();

Perform a search and verify the documents were indexed.

Example SolrJ Searching

    CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:9200/testindex/testtype/_solr");

    String qstr = "id:[* TO *]";
    SolrQuery query = new SolrQuery();
    query.setQuery(qstr);

    QueryResponse response = server.query(query);
    for (SolrDocument doc : response.getResults()) {
        for (String field : doc.getFieldNames()) {
            System.out.println(field + " = " + doc.getFieldValue(field));
        }
        System.out.println();
    }

Example using Nutch

At a minimum, use the following type mapping for ElasticSearch.

curl -XPUT 'http://localhost:9200/testindex'
curl -XPUT 'http://localhost:9200/testindex/testtype/_mapping' -d '{
    "testtype" : {
        "properties" : {
            "id" : {
                "type" : "string",
                "store": "yes"
            },
            "digest" : {
                "type" : "string",
                "store" : "yes",
                "index" : "no"
            },
            "boost" : {
                "type" : "float",
                "store" : "yes",
                "index" : "no"
            },
            "tstamp" : {
                "type" : "date",
                "store" : "yes",
                "index" : "no"
            }
        }
    }
}'

Follow the nutch tutorial at http://wiki.apache.org/nutch/NutchTutorial

  • Follow steps 1 though 3.1
  • For step 3.1 use:
bin/nutch crawl urls -solr http://localhost:9200/testindex/testtype/_solr -depth 3 -topN 5

Notes

ElasticSearch does not require a schema and all the data you send to Solr will be indexed by default. You
Can use the ElasticSearch PUT Mapping API to define your field types, what should be stored, analyzed, etc.
All data that is indexed via the mock XML Update Handler will most likely be detected by ElasticSearch as
strings, thus it is a good idea to mimic your Solr schema with an ElasticSearch type mapping.