/o2r-finder

Node.js implementation of search features for the o2r API

Primary LanguageJavaScriptApache License 2.0Apache-2.0

o2r-finder

Build Status

Implementation of search features and the endpoint /api/v1/search for the o2r API.

Architecture

The finder utilizes Elasticsearch to provide means for

  • A simple auto-suggest search functionality,
  • spatial search,
  • temporal search,
  • and other Elasticsearch queries.

The auto-suggest search is is not readily available with MongoDB (though it has full text search).

Since we don't want to worry about keeping things in sync, the finder simply re-indices the whole database at startup and then subscribes to changes in the MongoDB using node-elasticsearch-sync (for both steps).

The /api/v1/search endpoint allows two types of queries:

  1. Simple queries via GET: as an Elasticsearch query string

  2. Complex queries via POST: using the Elasticsearch Query DSL

For more details and examples see the Search API documentation.

Special characters

The finder supports searching for special characters for these fields:

  • metadata.o2r.identifier.doi
  • metadata.o2r.identifier.doiurl

To support additional fields with special characters, the mapping in config/mapping.js has to be updated in order to copy the fields into the group field _special

  • When doing a simple query via a query string, both the _special and the _all fields are searched:

/api/v1/search?q=10.1006%2Fjeem.1994.1031

  • When doing a complex query, the user has control over which fields are searched. To search both fields nest the queries like this:
"query": {
    "bool": {
        "should" : [
            {"query_string": {"default_field": "_all", "query": [...]}},
            {"query_string": {"default_field": "_special", "query": [...]}},
        ]
    }
}

Other possible options to search both fields are:

Indexed information

  • whole database muncher (a cluster or instance of Elasticsearch)
    • all compendia (collection in MongoDB, an index in Elasticsearch)
      • text documents (detected via mime type of the files) as fields in Elasticsearch
    • all jobs (collection in MongoDB, an index in Elasticsearch)

Compendia

The MongoDB id is stored as the entry id to allow deletion in Elasticsearch when an element is removed from MongoDB.

The "public" ID for the compendium is stored in compendium_id.

Example:

(...])
"hits": {
    "total": 6,
    "max_score": 1,
    "hits": [
        {
            "_score": 1,
            "_source": {
                "user": "0000-0001-6230-4374",
                "metadata": {},
                "jobs": [],
                "created": "2017-08-21T14:31:27.376Z",
                "files": {},
                "compendium_id": "mQryh"
            }
            },
            {
            "_score": 1,
            "_source": {
                "user": "0000-0001-6230-4374",
                "metadata": {},
                "jobs": [],
                "created": "2017-08-21T14:31:47.623Z",
                "files": {},
                "compendium_id": "Ks1Bc"
            }
        },
    ]
    (...)
}
(...)

Note: If you update the metadata structure of compendium or jobs and you already have indexed these in elasticsearch, you have to drop the elasticsearch o2r-index via

curl -XDELETE 'http://172.17.0.3:9200/o2r'

Otherwise, new compendia will not be indexed anymore.

Requirements

  • Elasticsearch server
  • Docker
  • Node.js
  • MondoDB, running with a replication set (!)

Dockerfile

This project includes a Dockerfile which can be built and run as follows. This is not a complete configuration, useful for testing only.

docker build -t finder .

# start databases in containers (optional)
docker run --name mongodb -d mongo:3.4 mongod --replSet rso2r --smallfiles
docker exec $(docker ps -qf "name=mongodb" bash -c "sleep 5; mongo --verbose --host mongodb --eval 'printjson(rs.initiate()); printjson(rs.conf()); printjson(rs.status()); printjson(rs.slaveOk());'"
docker run --name es -d -e ES_JAVA_OPTS="-Xms512m -Xmx512m" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:5.6.3

docker run -it --link mongodb --link es -e ELASTIC_SEARCH_URL=es:9200 -e FINDER_MONGODB=mongodb://mongodb -e MONGO_OPLOG_URL=mongodb://mongodb/muncher -e MONGO_DATA_URL=mongodb://mongodb/muncher -e DEBUG=finder -p 8084:8084 finder

The image can then be configured via environment variables.

Available environment variables

  • FINDER_PORT Required Port for HTTP requests, defaults to 8084.
  • FINDER_MONGODB Required Location for the mongo db. Defaults to mongodb://localhost:27017/. You will very likely need to change this (and maybe include the MongoDB port).
  • FINDER_MONGODB_DATABASE Which database inside the mongo db should be used. Defaults to muncher.
  • FINDER_MONGODB_COLL_COMPENDIA Name of the MongoDB collection for compendia, default is compendia.
  • FINDER_MONGODB_COLL_JOBS Name of the MongoDB collection for jobs, default is jobs.
  • FINDER_MONGODB_COLL_SESSION Name of the MongoDB collection for session information, default is sessions (must match other microservices).
  • FINDER_ELASTICSEARCH_INDEX_COMPENDIA Name of the Elasticsearch index for compendia, default is compendia
  • FINDER_ELASTICSEARCH_INDEX_JOBS Name of the Elasticsearch index for jobs, default is jobs.
  • SESSION_SECRET Secret used for session encryption, must match other services, default is o2r.
  • FINDER_STATUS_LOGSIZE Number of transformation results in the status log, default is 20.
  • node-elasticsearch-sync parameters
    • ELASTIC_SEARCH_URL Required, default is http://localhost:9200.
    • MONGO_OPLOG_URL Required, defaults to FINDER_MONGODB + FINDER_MONGODB_DATABASE, e.g. mongodb://localhost/muncher.
    • MONGO_DATA_URL Required, defaults to FINDER_MONGODB + FINDER_MONGODB_DATABASE, e.g. mongodb://localhost/muncher.
    • BATCH_COUNT Required, defaults to20.

Development

Start an Elasticsearch instance and exposing the default port on the host:

docker run -it --name elasticsearch -d -e ES_JAVA_OPTS="-Xms512m -Xmx512m" -e "xpack.security.enabled=false" -p 9200:9200 docker.elastic.co/elasticsearch/elasticsearch:5.6.3

Important: Starting with Elasticsearch 5, virtual memory configuration of the system (and in our case the host) requires some configuration, particularly of the vm.max_map_count setting, see https://www.elastic.co/guide/en/elasticsearch/reference/5.0/vm-max-map-count.html

You can then explore the state of Elasticsearch, e.g.

Start finder (potentially adjust Elasticsearch container's IP, see docker inspect elasticsearch)

npm install
DEBUG=finder FINDER_ELASTICSEARCH=localhost:9200 npm start;

You can set DEBUG=* to see MongoDB oplog messages.

Now check out the transferred documents:

Delete the index with

curl -XDELETE 'http://172.17.0.3:9200/o2r/'

Local test proxy

If you run the web service proxy from the project o2r-platform, you can run queries directly at the o2r API:

http://localhost/api/v1/search?q=*

Local container testing

The following code assumes the Docker host is available under IP 172.17.0.1 within the container.

 docker run -it -e DEBUG=finder -e FINDER_MONGODB=mongodb://172.17.0.1 -e ELASTIC_SEARCH_URL=http://172.17.0.1:9200 -p 8084:8084 finder

Tests

Required are running instances of Elasticsearch, MongoDB and the o2r-finder as described above.

To run the included tests, execute

npm test

License

o2r-informer is licensed under Apache License, Version 2.0, see file LICENSE.

Copyright (C) 2017 - o2r project.