This is a Singer tap that produces JSON-formatted data following the Singer spec from a MongoDB source.
This is a Proof of Concept and may have limited utility.
The Singer.io core team welcomes proposals regarding how this tap should work, especially in terms of filling in known limitations, but no promises are made in terms of timeliness of responses.
git clone git@github.com:singer-io/tap-mongodb.git # Clone this Repo
mkvirtualenv -p python3 tap-mongodb # Create a virtualenv
source tap-mongodb/bin/activate # Activate the virtualenv
pip install -e .
{
"host": "localhost",
"port": "27017",
"user": "user",
"password": "pass",
"dbname": "<name of database>"
}
tap-mongodb --config config.json --discover # Should dump a Catalog to sdtout
tap-mongodb --config config.json --discover > catalog.json # Capture the Catalog
Each entry under the Catalog's "stream" key will need the following metadata:
{
"streams": [
{
"stream_name": "people"
"metadata": [{
"breadcrumb": [],
"metadata": {
"selected": true,
"replication-method": "FULL_TABLE",
"custom-select-clause": "name,age,birthday,address,city,state,zip"
}
}]
}
]
}
A stream needs top level (no breadcrumb) metadata that describes the following:
- replication-method
- LOG_BASED: will use Mongo's Oplog
- FULL_TABLE: will sync the entire table on every tap run
- custom-select-clause
- a comma delimited list of columns in the table's data that will be selected and output during the run
tap-mongodb --config config.json --properties catalog.json
The tap will write bookmarks to stdout which can be captured and passed as an optional --state state.json
parameter to the tap for the next sync.
Copyright © 2019 Stitch