/datamart

Data augment

Primary LanguagePythonMIT LicenseMIT

MIT License travis ci

datamart

cd datamart
conda env create -f environment.yml
source activate datamart_env
git update-index --assume-unchanged datamart/resources/index_info.json

python -W ignore -m unittest discover

Validate your schema

Dataset providers should validate their dataset schema against our json schema by the following

python scripts/validate_schema.py --validate_json {path_to_json}

eg.

$ python scripts/validate_schema.py --validate_json test/tmp/tmp.json
$ Valid json

How to provide index for one data source

  1. Prepare your dataset schema following datamart index schema and validate it with the previous step

  2. Create your materialization method by creating a subclass of materializer_base.py. and put in datamart/materializers. See README

  3. Have your dataset schema json materialization.python_path pointed to the materialization method. Take a look at tmp.json.

  4. Play with the following:

Example of using current system

Create metadata and index it on Elasticsearch, following: Indexing demo

Query datamart, following: Query demo

Dealing with TAXI example, following: taxi_example

Note: Launch notebook:

jupyter notebook test/index.ipynb