o2r-finder
Implementation of search features and the endpoint /api/v1/search
for the o2r API.
Architecture
The finder utilizes Elasticsearch to provide means for
- A simple auto-suggest search functionality,
- spatial search,
- temporal search,
- and other Elasticsearch queries.
The auto-suggest search is is not readily available with MongoDB (though it has full text search).
Since we don't want to worry about keeping things in sync, the finder simply re-indices the whole database at startup and then subscribes to changes in the MongoDB using node-elasticsearch-sync (for both steps).
The /api/v1/search
endpoint allows two types of queries:
-
Simple queries via GET: as an Elasticsearch query string
-
Complex queries via POST: using the Elasticsearch Query DSL
For more details and examples see the Search API documentation.
Special characters
The finder supports searching for special characters for these fields:
metadata.o2r.identifier.doi
metadata.o2r.identifier.doiurl
To support additional fields with special characters, the mapping in config/mapping.js
has to be updated in order to copy the fields into the group field _special
- When doing a simple query via a query string, both the
_special
and the_all
fields are searched:
/api/v1/search?q=10.1006%2Fjeem.1994.1031
- When doing a complex query, the user has control over which fields are searched. To search both fields nest the queries like this:
"query": {
"bool": {
"should" : [
{"query_string": {"default_field": "_all", "query": [...]}},
{"query_string": {"default_field": "_special", "query": [...]}},
]
}
}
Other possible options to search both fields are:
Indexed information
- whole database muncher (a cluster or instance of Elasticsearch)
- all compendia (collection in MongoDB, an index in Elasticsearch)
text
documents (detected via mime type of the files) as fields in Elasticsearch
- all jobs (collection in MongoDB, an index in Elasticsearch)
- all compendia (collection in MongoDB, an index in Elasticsearch)
Compendia
The MongoDB id is stored as the entry id to allow deletion in Elasticsearch when an element is removed from MongoDB.
The "public" ID for the compendium is stored in compendium_id
.
Example:
(...])
"hits": {
"total": 6,
"max_score": 1,
"hits": [
{
"_score": 1,
"_source": {
"user": "0000-0001-6230-4374",
"metadata": {},
"jobs": [],
"created": "2017-08-21T14:31:27.376Z",
"files": {},
"compendium_id": "mQryh"
}
},
{
"_score": 1,
"_source": {
"user": "0000-0001-6230-4374",
"metadata": {},
"jobs": [],
"created": "2017-08-21T14:31:47.623Z",
"files": {},
"compendium_id": "Ks1Bc"
}
},
]
(...)
}
(...)
Note: If you update the metadata structure of compendium
or jobs
and you already have indexed these in elasticsearch, you have to drop the elasticsearch o2r
-index via
curl -XDELETE 'http://172.17.0.3:9200/o2r'
Otherwise, new compendia will not be indexed anymore.
Requirements
- Elasticsearch server
- Docker
- Node.js
- MondoDB, running with a replication set (!)
Dockerfile
This project includes a Dockerfile which can be built and run as follows. This is not a complete configuration, useful for testing only.
docker build -t finder .
# start databases in containers (optional)
docker run --name mongodb -d mongo:3.4 mongod --replSet rso2r --smallfiles
docker exec $(docker ps -qf "name=mongodb" bash -c "sleep 5; mongo --verbose --host mongodb --eval 'printjson(rs.initiate()); printjson(rs.conf()); printjson(rs.status()); printjson(rs.slaveOk());'"
docker run --name es -d -e ES_JAVA_OPTS="-Xms512m -Xmx512m" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:5.6.3
docker run -it --link mongodb --link es -e ELASTIC_SEARCH_URL=es:9200 -e FINDER_MONGODB=mongodb://mongodb -e MONGO_OPLOG_URL=mongodb://mongodb/muncher -e MONGO_DATA_URL=mongodb://mongodb/muncher -e DEBUG=finder -p 8084:8084 finder
The image can then be configured via environment variables.
Available environment variables
FINDER_PORT
Required Port for HTTP requests, defaults to8084
.FINDER_MONGODB
Required Location for the mongo db. Defaults tomongodb://localhost:27017/
. You will very likely need to change this (and maybe include the MongoDB port).FINDER_MONGODB_DATABASE
Which database inside the mongo db should be used. Defaults tomuncher
.FINDER_MONGODB_COLL_COMPENDIA
Name of the MongoDB collection for compendia, default iscompendia
.FINDER_MONGODB_COLL_JOBS
Name of the MongoDB collection for jobs, default isjobs
.FINDER_MONGODB_COLL_SESSION
Name of the MongoDB collection for session information, default issessions
(must match other microservices).FINDER_ELASTICSEARCH_INDEX_COMPENDIA
Name of the Elasticsearch index for compendia, default iscompendia
FINDER_ELASTICSEARCH_INDEX_JOBS
Name of the Elasticsearch index for jobs, default isjobs
.SESSION_SECRET
Secret used for session encryption, must match other services, default iso2r
.FINDER_STATUS_LOGSIZE
Number of transformation results in the status log, default is20
.- node-elasticsearch-sync parameters
ELASTIC_SEARCH_URL
Required, default ishttp://localhost:9200
.MONGO_OPLOG_URL
Required, defaults toFINDER_MONGODB + FINDER_MONGODB_DATABASE
, e.g.mongodb://localhost/muncher
.MONGO_DATA_URL
Required, defaults toFINDER_MONGODB + FINDER_MONGODB_DATABASE
, e.g.mongodb://localhost/muncher
.BATCH_COUNT
Required, defaults to20
.
Development
Start an Elasticsearch instance and exposing the default port on the host:
docker run -it --name elasticsearch -d -e ES_JAVA_OPTS="-Xms512m -Xmx512m" -e "xpack.security.enabled=false" -p 9200:9200 docker.elastic.co/elasticsearch/elasticsearch:5.6.3
Important: Starting with Elasticsearch 5, virtual memory configuration of the system (and in our case the host) requires some configuration, particularly of the vm.max_map_count
setting, see https://www.elastic.co/guide/en/elasticsearch/reference/5.0/vm-max-map-count.html
You can then explore the state of Elasticsearch, e.g.
- http://localhost:9200/
- http://localhost:9200/_nodes
- http://localhost:9200/_cat/health?v
- http://localhost:9200/_cat/indices?v
Start finder (potentially adjust Elasticsearch container's IP, see docker inspect elasticsearch
)
npm install
DEBUG=finder FINDER_ELASTICSEARCH=localhost:9200 npm start;
You can set DEBUG=*
to see MongoDB oplog messages.
Now check out the transferred documents:
- http://localhost:9200/o2r
- http://localhost:9200/o2r/compendia/_search?q=*&pretty
- http://localhost:9200/o2r/compendia/57b2eabfa0cd335b5d1192cc (use an ID from before)
- Looking at this response, you can also see the
_version
field, which is increased every time you restart finder (and full batch processing takes place) or a document is changed.
- Looking at this response, you can also see the
Delete the index with
curl -XDELETE 'http://172.17.0.3:9200/o2r/'
Local test proxy
If you run the web service proxy from the project o2r-platform, you can run queries directly at the o2r API:
http://localhost/api/v1/search?q=*
Local container testing
The following code assumes the Docker host is available under IP 172.17.0.1
within the container.
docker run -it -e DEBUG=finder -e FINDER_MONGODB=mongodb://172.17.0.1 -e ELASTIC_SEARCH_URL=http://172.17.0.1:9200 -p 8084:8084 finder
Tests
Required are running instances of Elasticsearch, MongoDB and the o2r-finder as described above.
To run the included tests, execute
npm test
License
o2r-informer is licensed under Apache License, Version 2.0, see file LICENSE.
Copyright (C) 2017 - o2r project.