The mongodb-updater is a dockerised application that will update a MongoDB database on a regular basis using JSON data from a URL.
The file specified in the JSON_FILE_URL
environment variable will be used as the source of the database update if
it is available, is valid JSON, and if the total count has not dropped by a significant amount as described in CHANGE_THRESHOLD
below.
The destination database is specified in the MONGO_HOST
, MONGO_PORT
, MONGO_DB
and MONGO_COLLECTION
variables.
If a unique identifier in the source data should be used as the collection's ID, the KEY_FOR_MONGO_ID
variable may be set.
This will set the _id
for each record inserted to the value of the key.
To reduce the amount of environment variables required, common database configurations are stored in the config/databases
folder.
These can be used by setting the DB_CONFIG
environment variable to the config file name, i.e. pharmacy
.
The available settings are db
, collection
, index
and idKey
.
If the equivalent environment variables are set, they will override the DB_CONFIG
settings.
Note: the index
may only be set via a database config file, as it is a rich JSON object containing index keys and options.
The file download and database update will run on startup, then on a daily schedule while the container continues to run.
The time of day defaults to 7am, and can be changed via the UPDATE_SCHEDULE
environment variable.
The schedule is run using node-schedule
which uses a cronlike syntax. Further details on node-schedule available here
Note: the container time is GMT and does not take account of daylight saving, you may need to subtract an hour from the time if it is currently BST.
When updating the MongoDB database the new data will be inserted into a temporary collection and validated against the existing collection. Once validation passes the existing collection will be deleted and the temporary collection renamed to take its place.
Validation will fail if the count of records drops significantly. The allowable drop in record count is controlled by
the CHANGE_THRESHOLD
environment variable. By default this is set to 0.99
which prevents the data being loaded if the new count
is less than 99% of the previous count.
Environment variables are expected to be managed by the environment in which the application is being run. This is best practice as described by twelve-factor.
Variable | Description | Default | Required |
---|---|---|---|
NODE_ENV |
node environment | development | |
LOG_LEVEL |
log level | Depends on NODE_ENV |
|
JSON_FILE_URL |
publicly available URL of JSON data | yes | |
MONGO_HOST |
host name of MongoDB server | yes | |
DB_CONFIG |
database configuration to read, currently only 'pharmacy' available | * | |
MONGO_PORT |
port of MongoDB server | 27017 | |
MONGO_DB |
MongoDB database to be updated | * | |
MONGO_COLLECTION |
MongoDB collection to be updated | * | |
KEY_FOR_MONGO_ID |
key in the source JSON data to use as the record's '_id' | no | |
CHANGE_THRESHOLD |
factor the data count can change by before erroring | 0.99 | |
UPDATE_SCHEDULE |
time of day to run the update | 0 7 * * * (7 am) |
* Either the DB_CONFIG
must be set, or both the MONGO_DB
and MONGO_COLLECTION
variables must be set.
This repo uses Architecture Decision Records to record architectural decisions for this project. They are stored in doc/adr.