This API is utilized by USAspending.gov to obtain all federal spending data which is open source and provided to the public as part of the DATA Act.
Ensure the following dependencies are installed and working prior to continuing:
Docker
which will handle the other application dependencies.Bash
or another Unix Shell equivalent- Bash is available on Windows as Windows Subsystem for Linux
Git
Using Docker is recommended since it provides a clean environment. Setting up your own local environment requires some technical abilities and experience with modern software tools.
- Command line package manager
- Windows' WSL bash uses
apt-get
- MacOS users will use
Homebrew
- Linux users already know their package manager (yum, apt, pacman, etc.)
- Windows' WSL bash uses
PostgreSQL
version 10.x (with a dedicateddata_store_api
database)Elasticsearch
version 7.1- Python 3.7 environment
- Highly recommended to use a virtual environment. There are various tools and associated instructions depending on preferences
Now, navigate to the base file directory where you will store the USAspending repositories
$ mkdir -p usaspending && cd usaspending
$ git clone https://github.com/fedspendingtransparency/usaspending-api.git
$ cd usaspending-api
There are three documented options for setting up a local database in order to run the API:
- Local Empty DB. Use your own local postgres database for the API to use.
- Containerized Empty DB. Create an empty directory on your localhost where all the database files will persist and use the docker-compose file to bring up a containerized postgres database.
- Local Populated DB. Download either the whole database or a database subset from the USAspending website.
Create a Local postgres database called 'data_store_api' and either create a new username and password for the database or use all the defaults. For help, consult:
Make sure to grant whatever user you created for the data_store api database superuser permissions or some scripts will not work:
postgres=# ALTER ROLE <<role/user you created>> WITH SUPERUSER;
See below for basic setup instructions. For help with Docker Compose:
-
None of these commands will rebuild a Docker image! Use
--build
if you make changes to the code or want to rebuild the image before running theup
steps. -
If you run a local database, set
POSTGRES_HOST
in.env
tohost.docker.internal
.POSTGRES_PORT
should be changed if it isn't 5432.-
docker-compose up usaspending-db
will create and run a Postgres database in thePOSTGRES_CLUSTER_DIR
specified in the.env
configuration file. We recommend using a folder outside of the usaspending-api project directory so it does not get copied to other containers in subsequent steps. -
docker-compose run usaspending-manage python3 -u manage.py migrate
will run Django migrations: https://docs.djangoproject.com/en/2.2/topics/migrations/. -
docker-compose run usaspending-manage python3 -u manage.py load_reference_data
will load essential reference data (agencies, program activity codes, CFDA program data, country codes, and others). -
docker-compose run usaspending-manage python3 -u manage.py matview_runner --dependencies
will provision the materialized views which are required by certain API endpoints.
-
docker-compose.yaml
contains the shell commands necessary to set up the database manually, if you prefer to have a more custom environment.
For further instructions on how to download, use, and setup the database using a subset of our data please go to:
Some of the API endpoints reach into Elasticsearch for data.
-
docker-compose up usaspending-es
will create and start a single-node Elasticsearch cluster, using theES_CLUSTER_DIR
specified in the.env
configuration file. We recommend using a folder outside of the usaspending-api project directory so it does not get copied to other containers. -
The cluster should be reachable via at http://localhost:9200 ("You Know, for Search").
-
Optionally, to see log output, use
docker-compose logs usaspending-es
(these logs are stored by docker even if you don't use this).
docker-compose up usaspending-api
- You can update environment variables in
settings.py
(buckets, elasticsearch, local paths) and they will be mounted and used when you run this.
The application will now be available at http://localhost:8000
.
In your local development environment, available API endpoints may be found at http://localhost:8000/docs/endpoints
Deployed production API endpoints and docs are found by following links here: https://api.usaspending.gov
Note: it is possible to run ad-hoc commands out of a Docker container once you get the hang of it, see the comments in the Dockerfile.
For details on loading reference data, DATA Act Broker submissions, and current USAspending data into the API, see loading_data.md.
For details on how our data loaders modify incoming data, see data_reformatting.md.
To run tests, you need:
- Postgres A running PostgreSQL database server (See Database Setup above)
- Elasticsearch A running Elasticsearch cluster (See Elasticsearch Setup above)
- Required Python Libraries Python package dependencies downloaded and discoverable (See below)
- Environment Variables Tell python where to connect to the various data stores (See below)
Once these are satisfied, simply run:
(usaspending-api) $ pytest
Create and activate the virtual environment using venv
, and ensure the right version of Python 3.7.x is being used (the latest RHEL package available for python36u
: as of this writing)
$ pyenv install 3.7.2
$ pyenv local 3.7.2
$ python -m venv .venv/usaspending-api
$ source .venv/usaspending-api/bin/activate
Your prompt should then look as below to show you are in the virtual environment named usaspending-api
(to exit that virtual environment, simply type deactivate
at the prompt).
(usaspending-api) $
pip
install
application dependencies
(usaspending-api) $ pip install -r requirements/requirements.txt
Create a .envrc
file in the repo root, which will be ignored by git. Change credentials and ports as-needed for your local dev environment.
export DATABASE_URL=postgres://usaspending:usaspender@localhost:5432/data_store_api
export ES_HOSTNAME=http://localhost:9200
export DATA_BROKER_DATABASE_URL=postgres://admin:root@localhost:5435/data_broker
If direnv
does not pick this up after saving the file, type
$ direnv allow
Alternatively, you could skip using direnv
and just export these variables in your shell environment.
Some automated integration tests run against a Broker database. If the dependencies to run such integration tests are not satisfied, those tests will bail out and be marked as Skipped.
(You can see messages about those skipped tests by adding the -rs
flag to pytest, like: pytest -rs
)
To satisfy these dependencies and include execution of these tests, do the following:
- Ensure you have
Docker
installed and running on your machine - Ensure the
Broker
source code is checked out alongside this repo at../data-act-broker-backend
- Ensure you have the
DATA_BROKER_DATABASE_URL
environment variable set, and pointing to a live PostgreSQL server (no database required) - Ensure you have built the
Broker
backend Docker image by running:
(usaspending-api) $ docker build -t dataact-broker-backend ../data-act-broker-backend
NOTE: Broker source code should be re-fetched and image rebuilt to ensure latest integration is tested
Re-running the test suite using pytest -rs
with these dependencies satisfied should yield no more skips of the broker integration tests.
To submit fixes or enhancements, or to suggest changes, see CONTRIBUTING.md