/rapidpro-normalizer

Command line utility to flatten RapidPro API Responses in order to export them

Primary LanguagePythonMIT LicenseMIT

forthebadge made-with-python
GitHub license GitHub tag

RapidPro Normalizer

RapidPro Normalizer is a command line utility to flatten records of RapidPro API Responses in order to export them as files or database records.

Features

  • Interactive command line interface
  • Easy Yaml configuration
  • Export dataset to file and database
  • Works on Linux and Windows (may be on Mac as well)

Installation

Prerequisites

The easy way

The easiest way to install this utility is to clone it from GitHub:

$ git clone https://github.com/supermalang/rapidpro-normalizer.git

Navigate to the directory and install the python requirements

$ cd rapidpro-normalizer
$ pip3 install -r requirements.txt

Configuration

Create the .env file

Create the .env file from the sample.env file:

$ cp sample.env .env

Now open the .env file and configure it by putting the good values for RAPIDPRO_TOKEN, DB_HOST, DB_NAME, DB_USER and DB_PASSWORD .env file code

  • 🆘 If you do not have a RapidPro token please contact your Technical Focal Point.
  • 🆗 If you do not export to database you can ignore the database credentials

Update the config file

Create the config.yml file and update the content.

  1. Create the config file:
$ cp sample.config.yml config.yml
  1. Define the file export settings. The path must in the directory of the utility
  2. Enable or disable the export to database
  3. Give the field group to use to fetch columns that need to be exported. Here covid_edu_poll is the field group, but you can customize. Make sure it does not contain spaces, numbers or special characters. The field group will be refered in the command line as fieldgroup.
  4. Give your fields. The RapidPro field hierarchy from the API Response must be conserved.
    First you can use API clients like Postman to send a request and look into the response to see what the fields hierarchy looks like. You will need to consider only fields that are in the results property.

You don't need to put all fields. Just give fields you want to be exported in the dataset.

You can add many field groups in your config file but only one fielgroup can be used at at time in the command line.

config.yml file code

Customize the config file requests types

This part is optional

You can customize the values of the requests types getcontacts, getruns and getmessages directly in the config.yml file to add requests parameters as necessary.

Example: If you are only interested in runs that belong to a given flow you can customize the request type as following:

# Types of api requests you can use
# You can customize the requests by adding parameters that comply with the RapidPro API
rapidpro_api_settings:
    - request_types:
        - getruns: "https://api.rapidpro.io/api/v2/runs.json?flow=f5901b62-ba76-4003-9c62-72fdacc1b7b7"

Update the database

You can ignore this part if you do not export to database

⚠️ Make sure your databaseuser has at least the ALTER privilege on the database.

Update the database to use utf8mb4 as the default character set.

ALTER SCHEMA `databasename`  DEFAULT CHARACTER SET utf8mb4 ;

Change databasename by the name of your database.

Usage

Command line

The syntax to use the RapidPro Normalizer is:

$ python3 src/data/make.py [OPTIONS]

⚠️ Depending on your environment you might need to use python (with version 3) instead of python3

You can use the following options:

  • requesttype: type of the RapidPro request. The requesttype needs to be defined in the config file
  • fieldgroup: Group of fields to export. The fieldgroup needs to be defined in the config file.
  • datasetname: name of the dataset to export.

Interactive execution
CLI Execution 1

Inline execution

$ python3 src/data/make.py --requesttype getcontacts --fieldgroup contact_fields --datasetname mycontacts

Schedule automatic execution

This part is optional

You can schedule the automatic execution of the utility by creating a cron task on a Linux machine or using the Task scheduler on Windows. Follow these steps, if you are using Linux:

  1. Display and copy the command to be executed by the cron task
    ⚠️ Make sure you are still in the rapidpro-normalization directory

Run the following to copy the command to give to the cron task. You will need to update the parameters accordingly.

$ echo "python3 $(pwd)/src/data/make.py --requesttype getcontacts --fieldgroup contact_fields --datasetname mycontacts"

You will have a result like: Command To Schedule

  1. Edit the crontab file

The crontab file contains instructions for the cron daemon in the following simplified manner: "run this command on this date at this time".

$ crontab -e

Add at the end of the file the command you have copied from the previous step in this way and save and close the file:

0 1 * * * python3 /home/user/path/to/rapidpro-mormalizer/src/data/make.py --requesttype getcontacts --fieldgroup contact_fields --datasetname mycontacts

This gives instruction to the cron daemon to run the command python3 /home/user/path/to/rapidpro-mormalizer/src/data/make.py --requesttype getcontacts --fieldgroup contact_fields --datasetname mycontacts every day at 1:00 AM.