cd datamart
conda env create -f environment.yml
source activate datamart_env
git update-index --assume-unchanged datamart/resources/index_info.json
python -W ignore -m unittest discover
Dataset providers should validate their dataset schema against our json schema by the following
python scripts/validate_schema.py --validate_json {path_to_json}
eg.
$ python scripts/validate_schema.py --validate_json test/tmp/tmp.json
$ Valid json
-
Prepare your dataset schema following datamart index schema and validate it with the previous step
-
Create your materialization method by creating a subclass of
materializer_base.py
. and put indatamart/materializers
. See README -
Have your dataset schema json
materialization.python_path
pointed to the materialization method. Take a look at tmp.json. -
Play with the following:
Create metadata and index it on Elasticsearch, following: Indexing demo
Query datamart, following: Query demo
Dealing with TAXI example, following: taxi_example
Note: Launch notebook:
jupyter notebook test/index.ipynb