Labrador is a project aimed to do one thing, and one thing only: to retrieve data from a source.
The project is composed of two packages:
- labrador - a library encapsulating the retrieve-sink logic;
- server - a REST server that uses the library.
Before installing, make sure you are in the correct environment and go to the project root directory (the one in which this README file is contained). The ideal is to create a virtual environment:
$ mkvirtualenv --python=$(which python3) <YOUR_ENVNAME>
After that, activate your environment and install the package.
$ workon <YOUR_ENVNAME>
$ python setup.py install
Just import the modules, classes, functions and etc., as any normal Python package. Ex:
$ workon <YOUR_ENVNAME>
$ python
>>> import labrador
$ workon <YOUR_ENVNAME>
$ export PRIVATE_KEY_PEM=<THE_PEM_PRIVATE_KEY_TO_DECRYPT_THE_CREDENTIALS>
$ python server/start.sh
The server is started with 10 workers, each with a timeout of 600 seconds.
Both tests retrieve data from a BigQuery public dataset, JSONify it and saves in S3.
For both labrador and server tests, notice that the credentials sent are encrypted
$ workon <YOUR_ENVNAME>
$ export PRIVATE_KEY_PEM=<THE_PEM_PRIVATE_KEY_TO_DECRYPT_THE_CREDENTIALS>
$ python test/test_labrador.py <CREDENTIALS_DIRPATH> <BUCKET_NAME>
Turn the server up, as shown in the using section and do:
$ workon <YOUR_ENVNAME>
$ python test/test_server.py <CREDENTIALS_DIRPATH> <BUCKET_NAME> <TEST_ID>