/politica_preventiva

In order to improve the targeting of social programs, the System of Integral Social Information (Sistema de Información Social Integral - SISI) strives to create a platform to analyse multi-dimensional data not usually taken into account when developing social policy in Mexico. This pipeline ingests, preprocesses and cleans more than 30 sources of information from different private and public entities and, establishes a process for feature creation and the execution of statistical models.

Primary LanguageRGNU General Public License v3.0GPL-3.0

Platforma Preventiva -

About:

In order to improve the targeting of social programs, the System of Integral Social Information (Sistema de Información Social Integral - SISI) strives to create a platform to analyse multi-dimensional data not usually taken into account when developing social policy in Mexico.

This pipeline ingests, preprocesses and cleans more than 30 sources of information from different private and public entities and, establishes a process for feature creation and the execution of statistical models.

Installation

The Ingest pipeline can be run after cloning this repository

  • Check main dependencies on prerequisits
  • make init to install the project python requirements
  • sh infraestructura/registrar.sh to build the base images
  • make setup To build the project images
  • make run To run the pipeline

Dependencies

  • Python 3.5.2
  • pip3
  • luigi
  • git
  • psql (PostgreSQL) 9.5.4
  • PostGIS 2.1.4
  • ...and other Python packages (see requirements.txt)

Data Pipeline

After you create the environment set up the pipeline_tasks in luigi.cfg The general process of the pipeline is:

  • StartPipeline:
  • RunPipelines [politica_preventiva/pipelines/politica_preventiva.py]
  • Ingest: [politica_preventiva/pipelines/ingest/ingest_orchestra.py]
  • LocalIngest: Ingest data from multiple sources
  • LocalToS3: Upload to S3 and save historical by date
  • UpdateDB: Update Postgres tables and Create indexes (see commons/pg_raw_schemas)
  • ETL: [politica_preventiva/pipelines/etl/etl_orchestra.py]
  • Features: [politica_preventiva/pipelines/features/features_orchestra.py]
  • Models: [politica_preventiva/pipelines/models/models_orchestra.py]

Contributors

javurena7 rsanchezavalos andreanr andreuboada
javurena7 rsanchezavalos andreanr andreuboada
monzalo14 abrownrb ollin18
monzalo14 abrownrb ollin18