/datastore

A Data Store application for 360Giving

Primary LanguagePythonGNU Affero General Public License v3.0AGPL-3.0

DataStore for 360 Giving data

Build Status Coverage Status

Postgres setup

Example:

In this example we create a user test and password test for dev usage.

$ sudo apt-get install postgresql-12 postgresql-server-dev-12
$ sudo -u postgres createuser -P -e test  --interactive
$ createdb -U test -W 360givingdatastore

(In development you can also set the DATABASE_HOST, DATABASE_NAME,DATABASE_USER and DATABASE_PASSWORD environmental variables.)

Python setup

$ virtualenv --python=python3 ./.ve/
$ source ./.ve/bin/activate
$ pip install -r requirements.txt

Run the dev server

$ export DJANGO_SETTINGS_MODULE=settings.settings_dev
$ manage.py migrate
$ manage.py createsuperuser
$ manage.py runserver

Loading grant data

Note: before loading grant data you may wish to load additional_data sources

$ manage.py load_datagetter_data ../path/to/data/dir/from/datagetter/

Updating entities data

Create/update the Recipient/Funder model entries from grant data.

$ python manage.py manage_entities_data --update

Updating additional data

A number of the sources for additional_data have their own local caches which need to be kept up-to-date.

To better understand additional data, refer to 360Giving Datastore - additional data.

For a script which combines all the steps, see datastore/additional_data/sources/update_all_sources.sh

Occasionally we also need to update the upstream URLs where data is fetched from, found in datastore/additional_data/sources/*.py.

Update 360G Grant Data Schema for OpenAPI Docs

Our API docs / schema are based on OpenAPI 3.0 (as generated by drf-spectacular). OpenAPI 3.0 is incompatible with the JSON Schema used by 360G, so we keep a copy of 360G's schema converted into OpenAPI 3.0 format. When 360G updates their standard/schema, we should update this copy too.

To do this, first install the CLI tool used to convert JSON Scheam to OpenAPI 3.0:

npm install -g --save @openapi-contrib/json-schema-to-openapi-schema

When the schema changes, copy from standard repo to static/, and convert from JSON Schema to OpenAPI 3.0, e.g.:

STANDARD_VERSION=1.3
cd datastore/static/
curl https://raw.githubusercontent.com/ThreeSixtyGiving/standard/${STANDARD_VERSION}/schema/360-giving-schema.json > 360-giving-schema-${STANDARD_VERSION}-jsonschema.json
json-schema-to-openapi-schema convert 360-giving-schema-${STANDARD_VERSION}-jsonschema.json > 360-giving-schema-${STANDARD_VERSION}-openapi.json

and update the TSG_SCHEMA_STATICFILE setting in settings.py.

360G CodeLists

Downloads codelists from the ThreeSixtyGiving/standard GitHub repo.

./manage.py load_codelist_codes

Geo Data

Look at the datastore_num_current_grants_with_beneficiary_location_geocode_without_lookup metric of the getter run before and after updating geodata, it should go down.

./manage.py load_geocode_names # CHD Data
./manage.py load_geolookups    # from https://github.com/drkane/geo-lookups
./manage.py load_nspl

Organisation Data

# Got to delete the old org data before loading in the new
./manage.py delete_org_data --no-prompt

./additional_data/sources/load_all_org_data.sh

Other useful commands

There are many useful management commands see:

$ manage.py --help

Dev with Docker Compose

Developers can also use Docker Compose to get a local development environment.

Running

docker-compose -f docker-compose.dev.yml up

The website should be available at http://localhost:8000

Use Ctrl-C to exit.

Loading grant data & additional data

Whilst leaving the up command running, you should use docker-compose run with the commands from the above sections.

eg; instead of running:

$ manage.py load_geocode_names

Run:

$ docker-compose -f docker-compose.dev.yml run datastore-web python datastore/manage.py load_geocode_names

Getting database CLI

Run:

$ docker-compose -f docker-compose.dev.yml run -e PGPASSWORD=postgres postgres psql -h postgres -U postgres 

Testing

Requirements

$ pip install -r ./requirements_dev.txt

You will also need the chromedriver for your machine's chromimum based browser. see https://chromedriver.chromium.org/downloads

Alternatively edit the selenium test setup in test_browser to use your preferred selenium setup.

Run tests

$ ./manage.py test tests
$ flake8
$ black --check ./

Running specific tests

You can run any particular tests individually e.g.:

$ manage.py test tests.test_additional_data_tsgorgtype

see manage.py test --help for more info

Updating requirements

We target python3.8 for our requirements.

Use pip-compile provided by pip-tools package to process requirements .in files.

Key modules in the datastore

db

This module is the central datastore for 360 Giving data. It contains the models which define the database and the ORM for accessing, creating and updating the grant data.

A key function is managing the Latest data which represent the created datasets that are built from datagetter grant data. These datasets are used in GrantNav.

Management commands here allow for loading and managing datasets as well as a mechanism for external scripts to update the current status of the system (status is used in the UI and for GrantNav API).

api

This contains the API endpoints that are used to control the system from the UI, indicate the status and data download url for GrantNav updates as well as an experimental REST API built using django-rest-framework.

ui

Templates and staic html/js live here, there is a basic dashboard which shows the current status of the system as well as a mechanism to trigger a full datarun (fetch and load).

additional_data

During the load of grant data (datagetter data) that is done by the db module command load_datagetter_data each grant is passed to the create method of the AdditionalDataGenerator, here various sources are used to add to an additional_data object that is available on the Grant model.

additional_data data sources come in various forms, static files which are loaded, as well as caches of data in our local database (for example postcode lookups).

The generator ensures a particular order to additional_data fields being added which allows for dependencies of one source to another.

prometheus

Provides a prometheus endpoint to monitor vital metrics on the datastore

tools

An example datarun script. This is an orchestrator of running a datagetter, updating the statuses and loading the data into the datastore.

settings

Django Settings for the datastore. Includes location for data run logs, the data run script / pid

tests

Various cross-module tests.