This repository is intended for project tracking. Here you can also find raw data and utilities for validation and indexing.
Pipeline:
- start: json files (an array of objects per source) in
json
folder - then: jsonl files (same data, but one objects per line) in
jsonl
folder - finally: indexing in
elasticsearch
folder
You can validate all files using JSON Schema in schema
folder. Refer to README files in each folder for further informations, you need Python 3 and virtual environments managed by pipenv.
General usage:
cd [folder]
pipenv shell
pipenv install
(only the first time)python [script] [...args]
(inside the virtual env) orpipenv run python [script] [...args]