/mongo-connector

MongoDB data stream pipeline tools by YouGov (adopted from MongoDB)

Primary LanguagePythonApache License 2.0Apache-2.0

mongo-connector

The mongo-connector project originated as a MongoDB mongo-labs project and is now community-maintained under the custody of YouGov, Plc.

View build status

For complete documentation, check out the Mongo Connector Wiki.

System Overview

mongo-connector creates a pipeline from a MongoDB cluster to one or more target systems, such as Solr, Elasticsearch, or another MongoDB cluster. It synchronizes data in MongoDB to the target then tails the MongoDB oplog, keeping up with operations in MongoDB in real-time. Detailed documentation is available on the wiki.

Getting Started

mongo-connector supports Python 3.4+ and MongoDB versions 3.4 and 3.6.

Installation

To install mongo-connector with the MongoDB doc manager suitable for replicating data to MongoDB, use pip:

pip install mongo-connector

The install command can be customized to include the Doc Managers and any extra dependencies for the target system.

Target System Install Command
MongoDB pip install mongo-connector
Elasticsearch 1.x pip install 'mongo-connector[elastic]'
Amazon Elasticsearch 1.x Service pip install 'mongo-connector[elastic-aws]'
Elasticsearch 2.x pip install 'mongo-connector[elastic2]'
Amazon Elasticsearch 2.x Service pip install 'mongo-connector[elastic2-aws]'
Elasticsearch 5.x pip install 'mongo-connector[elastic5]'
Solr pip install 'mongo-connector[solr]'

You may have to run pip with sudo, depending on where you're installing mongo-connector and what privileges you have.

System V Service

Mongo Connector provides support for installing and uninstalling itself as a service daemon under System V Init on Linux. Following install of the package, install or uninstall using the following command:

$ python -m mongo_connector.service.system-v [un]install

Development

You can also install the development version of mongo-connector manually:

git clone https://github.com/yougov/mongo-connector.git
pip install ./mongo-connector

Using mongo-connector

mongo-connector replicates operations from the MongoDB oplog, so a replica set must be running before startup. For development purposes, you may find it convenient to run a one-node replica set (note that this is not recommended for production):

mongod --replSet myDevReplSet

To initialize your server as a replica set, run the following command in the mongo shell:

rs.initiate()

Once the replica set is running, you may start mongo-connector. The simplest invocation resembles the following:

mongo-connector -m <mongodb server hostname>:<replica set port> \
                -t <replication endpoint URL, e.g. http://localhost:8983/solr> \
                -d <name of doc manager, e.g., solr_doc_manager>

mongo-connector has many other options besides those demonstrated above. To get a full listing with descriptions, try mongo-connector --help. You can also use mongo-connector with a configuration file.

If you want to jump-start into using mongo-connector with a another particular system, check out:

Doc Managers

Elasticsearch 1.x: https://github.com/yougov/elastic-doc-manager

Elasticsearch 2.x and 5.x: https://github.com/yougov/elastic2-doc-manager

Solr: https://github.com/yougov/solr-doc-manager

The MongoDB doc manager comes packaged with the mongo-connector project.

Troubleshooting/Questions

Having trouble with installation? Have a question about Mongo Connector? Your question or problem may be answered in the FAQ or in the wiki. If you can't find the answer to your question or problem there, feel free to open an issue on Mongo Connector's Github page.