Elasticsearch-powered search engine for looking for charities and other non-profit organisations. Allows for:
- importing data nearly 20 sources in the UK, ensuring that duplicates are matched to one record.
- An elasticsearch index that can be queried.
- Org-ids are added to organisations.
- Reconciliation API for searching organisations, based on an optimised search query.
- Facility for uploading a CSV of charity names and adding the (best guess) at a charity number.
- HTML pages for searching for a charity
- Clone repository
- Create virtual environment (
python -m venv env
) - Activate virtual environment (
env/bin/activate
orenv/Scripts\activate
) - Install requirements (
pip install -r requirements.txt
) - Install postgres
- Start postgres
- Create 2 postgres databases - one for admin (eg
ftc_admin
and one for data egftc_data
) - Install elasticsearch 7 - you may need to increase available memory (see below)
- Start elasticsearch
- Create
.env
file in root directory. Contents based on.env.example
. - Create the database tables (
python ./manage.py migrate --database=data && python ./manage.py migrate --database=admin && python ./manage.py createcachetable --database=admin
) - Import data on charities (
python ./manage.py import_charities
) - Import data on nonprofit companies (
python ./manage.py import_ch
) - Import data on other non-profit organisations (
python ./manage.py import_all
) - Add organisations to elasticsearch index (
python ./manage.py es_index
) - (Don't use the defaultsearch_index
command as this won't setup aliases correctly)
SSH into server and run:
# create app
dokku apps:create ftc
# postgres
sudo dokku plugin:install https://github.com/dokku/dokku-postgres.git postgres
dokku postgres:create ftc-db-data
dokku postgres:link ftc-db-data ftc --alias "DATABASE_URL"
dokku postgres:create ftc-db-admin
dokku postgres:link ftc-db-admin ftc --alias "DATABASE_ADMIN_URL"
# elasticsearch
sudo dokku plugin:install https://github.com/dokku/dokku-elasticsearch.git elasticsearch
echo 'vm.max_map_count=262144' | sudo tee -a /etc/sysctl.conf; sudo sysctl -p
export ELASTICSEARCH_IMAGE="elasticsearch"
export ELASTICSEARCH_IMAGE_VERSION="7.7.1"
dokku elasticsearch:create ftc-es
dokku elasticsearch:link ftc-es ftc
# configure elasticsearch 7:
# https://github.com/dokku/dokku-elasticsearch/issues/72#issuecomment-510771763
# setup elasticsearch increased memory (might be needed)
nano /var/lib/dokku/services/elasticsearch/ftc-es/config/jvm.options
# replace `-Xms512m` with `-Xms2g`
# replace `-Xms512m` with `-Xmx2g`
# restart elasticsearch
dokku elasticsearch:restart ftc-es
# Redirect
dokku plugin:install https://github.com/dokku/dokku-redirect.git
dokku redirect:set ftc www.findthatcharity.uk findthatcharity.uk
# SSL
sudo dokku plugin:install https://github.com/dokku/dokku-letsencrypt.git
dokku letsencrypt:set ftc email your@email.tld
dokku letsencrypt:enable ftc
dokku letsencrypt:cron-job --add
On local machine:
git remote add dokku dokku@SERVER_HOST:ftc
git push dokku main
On Dokku server run:
# setup
dokku run ftc python ./manage.py migrate --database=data
dokku run ftc python ./manage.py migrate --database=admin
dokku run ftc python ./manage.py createcachetable --database=admin
# run import
dokku run ftc python ./manage.py charity_setup
dokku run ftc python ./manage.py import_oscr
dokku run ftc python ./manage.py import_charities
dokku run ftc python ./manage.py import_ch
dokku run ftc python ./manage.py import_other_data
dokku run ftc python ./manage.py import_all
dokku run ftc python ./manage.py es_index
The server uses django. Run it with the following command:
python ./manage.py runserver
The server offers the following API endpoints:
-
/reconcile
: a reconciliation service API conforming to the OpenRefine reconciliation API specification. -
/charity/12345
: Look up information about a particular charity
Priorities:
- tests for ensuring data is correctly imported
- server tests
- use results of
server/recon_test.py
to produce the best reconciliation search query for use in the server (recon_test_7
seems the best at the moment) - threshold for when to use the result vs discard
Future development:
- upload a CSV file and reconcile each row with a charity
- allow updating a charity with additional possible names
coverage run pytest && coverage html
python -m http.server -d htmlcov --bind 127.0.0.1 8001