Jarbas — a tool for Serenata de Amor

Jarbas is part of Serenata de Amor — we fight corruption with data science.

Jarbas is in charge of making data from CEAP more accessible. In the near future Jarbas will show what Rosie thinks of each reimbursement made for our congresspeople.

JSON API endpoints
Installing

JSON API endpoints

Reimbursement

Each Reimbursement object is a reimbursement claimed by a congressperson and identified publicly by its document_id.

Retrieving a specific reimbursement

`GET /api/reimbursement/<document_id>/`

Details from a specific reimbursement. If receipt_url wasn't fetched yet, the server won't try to fetch it automatically.

`GET /api/reimbursement/<document_id>/receipt/`

URL of the digitalized version of the receipt of this specific reimbursement.

If receipt_url wasn't fetched yet, the server will try to fetch it automatically.

If you append the parameter force (i.e. GET /api/reimbursement/<document_id>/receipt/?force=1) the server will re-fetch the receipt URL.

Not all receipts are available, so this URL can be null.

Listing reimbursements

`GET /api/reimbursement/`

Lists all reimbursements.

Filtering

All these endpoints accepts any combination of the following parameters:

applicant_id
cnpj_cpf
document_id
issue_date_start (inclusive)
issue_date_end (exclusive)
month
subquota_id
suspicions (boolean, 1 parses to True, 0 to False)
year
order_by: issue_date (default) or probability (both descending)
in_latest_dataset (boolean, 1 parses to True, 0 to False)

For example:

GET /api/reimbursement/?year=2016&cnpj_cpf=11111111111111&subquota_id=42&order_by=probability

This request will list:

all 2016 reimbursements
made in the supplier with the CNPJ 11.111.111/1111-11
made according to the subquota with the ID 42
sorted by the highest probability

Also you can pass more than one value per field (e.g. document_id=111111,222222).

`GET /api/reimbursement/<document_id>/same_day/`

Lists all reimbursements of expenses from the same day as document_id.

Subquota

Subqoutas are categories of expenses that can be reimbursed by congresspeople.

Listing subquotas

`GET /api/subquota/`

Lists all subquotas names and IDs.

Filtering

Accepts a case-insensitve LIKE filter in as the q URL parameter (e.g. GET /api/subquota/?q=meal list all applicant that have meal in their names.

Applicant

An applicant is the person (congressperson or theleadership of aparty or government) who claimed the reimbursemement.

List applicants

`GET /api/applicant/`

Lists all names of applicants together with their IDs.

Filtering

Accepts a case-insensitve LIKE filter in as the q URL parameter (e.g. GET /api/applicant/?q=lideranca list all applicant that have lideranca in their names.

Company

A company is a Brazilian company in which congressperson have made expenses and claimed for reimbursement.

Retrieving a specific company

`GET /api/company/<cnpj>/`

This endpoit gets the info we have for a specific company. The endpoint expects a cnpj (i.e. the CNPJ of a Company object, digits only). It returns 404 if the company is not found.

Tapioca Jarbas

There is also a tapioca-wrapper for the API. The tapioca-jarbas can be installed with pip install tapioca-jarbas and can be used to access the API in any Python script.

Installing

Using Docker

With Docker and Docker Compose) installed just run:

$ make run.devel

$ docker-compose up -d
$ docker-compose run --rm jarbas python manage.py migrate
$ docker-compose run --rm jarbas python manage.py ceapdatasets
$ docker-compose run --rm jarbas python manage.py tweets
$ docker-compose run --rm jarbas python manage.py collectstatic --no-input

You can access it at localhost:8000. However your database starts empty, but you can use sample data to development using this command:

$ make seed.sample

$ docker-compose run --rm jarbas python manage.py reimbursements contrib/sample-data/reimbursements_sample.xz
$ docker-compose run --rm jarbas python manage.py companies contrib/sample-data/companies_sample.xz
$ docker-compose run --rm jarbas python manage.py suspicions contrib/sample-data/suspicions_sample.xz

You can get the datasets running Rosie or directly with the toolbox.

To add a fresh new reimbursements.xz brewed by Rosie or made with our toolbox, you just need to have this file inside project folder and give the path at the end of the command, as bellow:

$ docker-compose run --rm jarbas python manage.py reimbursements path/to/my/fresh_new_reimbursements.xz

To change any of the default environment variables defined in the docker-compose.yml just export it in a local environment variable, so when you run Jarbas it will get them.

Finally if you would like to access the Django Admin for an alternative view of the reimbursements, you can access it at localhost:8000/admin/ creating an user with:

$ docker-compose run --rm jarbas python manage.py createsuperuser

Local install

Requirements

Jarbas requires Python 3.5, Node.js 6+, Yarn, and PostgreSQL 9.4+. Once you have pip and yarn available install the dependencies:

$ yarn install
$ python -m pip install -r requirements-dev.txt

Python's `lzma` module

In some Linux distros lzma is not installed by default. You can check whether you have it or not with $ python -m lzma. In Debian based systems you can fix that with $ apt-get install liblzma-dev or in macOS with $ brew install xz — but you might have to re-compile your Python.

Setup your environment variables

Basically this means copying contrib/.env.sample as .env in the project's root folder — but there is an entire section on that.

Migrations

Once you're done with requirements, dependencies and settings, create the basic database structure:

$ python manage.py migrate

Load data

Now you can load the data from our datasets and get some other data as static files:

$ python manage.py reimbursements <path to reimbursements.xz>
$ python manage.py suspicions <path to suspicions.xz file>
$ python manage.py companies <path to companies.xz>
$ python manage.py tweets
$ python manage.py ceapdatasets

You can get the datasets running Rosie or directly with the toolbox.

Generate static files

We generate assets through NodeJS, so run it before Django collecting static files:

$ yarn assets
$ python manage.py collectstatic

Ready?

Not sure? Test it!

$ python manage.py check
$ python manage.py test
$ yarn test

Ready!

Run the server with $ python manage.py runserver and load localhost:8000 in your favorite browser.

Using Django Admin

If you would like to access the Django Admin for an alternative view of the reimbursements, you can access it at localhost:8000/admin/ creating an user with:

$ python manage.py createsuperuser

Settings

If you are not using Docker copy contrib/.env.sample as .env in the project's root folder and adjust your settings. These are the main variables:

Django settings

DEBUG (bool) enable or disable Django debug mode
SECRET_KEY (str) Django's secret key
ALLOWED_HOSTS (str) Django's allowed hosts
USE_X_FORWARDED_HOST (bool) Whether to use the X-Forwarded-Host header
CACHE_BACKEND (str) Cache backend (e.g. django.core.cache.backends.memcached.MemcachedCache)
CACHE_LOCATION (str) Cache location (e.g. localhost:11211)
SECURE_PROXY_SSL_HEADER (str) Django secure proxy SSL header (e.g. HTTP_X_FORWARDED_PROTO,https transforms in tuple ('HTTP_X_FORWARDED_PROTO', 'https'))

Database

DATABASE_URL (string) Database URL, must be PostgreSQL since Jarbas uses JSONField.

Amazon S3 settings

AMAZON_S3_BUCKET (str) Name of the Amazon S3 bucket to look for datasets (e.g. serenata-de-amor-data)
AMAZON_S3_REGION (str) Region of the Amazon S3 (e.g. s3-sa-east-1)
AMAZON_S3_CEAPTRANSLATION_DATE (str) File name prefix for dataset guide (e.g. 2016-08-08 for 2016-08-08-ceap-datasets.md)

Google settings

GOOGLE_ANALYTICS (str) Google Analytics tracking code (e.g. UA-123456-7)
GOOGLE_STREET_VIEW_API_KEY (str) Google Street View Image API key

Twitter settings

TWITTER_CONSUMER_KEY (str) Twitter API key
TWITTER_CONSUMER_SECRET (str) Twitter API secret
TWITTER_ACCESS_TOKEN (str) Twitter access token
TWITTER_ACCESS_SECRET (str) Twitter access token secret

To get this credentials follow python-twitter instructions.

luanfonceca/jarbas

Jarbas — a tool for Serenata de Amor

Table of Contents

JSON API endpoints

Reimbursement

Retrieving a specific reimbursement

GET /api/reimbursement/<document_id>/

GET /api/reimbursement/<document_id>/receipt/

Listing reimbursements

GET /api/reimbursement/

Filtering

GET /api/reimbursement/<document_id>/same_day/

Subquota

Listing subquotas

GET /api/subquota/

Filtering

Applicant

List applicants

GET /api/applicant/

Filtering

Company

Retrieving a specific company

GET /api/company/<cnpj>/

Tapioca Jarbas

Installing

Using Docker

Local install

Requirements

Python's lzma module

Setup your environment variables

Migrations

Load data

Generate static files

Ready?

Ready!

Using Django Admin

Settings

Django settings

Database

Amazon S3 settings

Google settings

Twitter settings

`GET /api/reimbursement/<document_id>/`

`GET /api/reimbursement/<document_id>/receipt/`

`GET /api/reimbursement/`

`GET /api/reimbursement/<document_id>/same_day/`

`GET /api/subquota/`

`GET /api/applicant/`

`GET /api/company/<cnpj>/`

Python's `lzma` module