Simple example of creating a pipeline that reads in a TSV file, does some basic cleaning, and indexes the results into Elasticsearch. The actual code is in scripts/import_tsv.coffee, but Coffeescript isn't required to run the compiled JS code.
Installation:
npm i -g ewr/es-import-tsv
Usage:
es-import-tsv --index my_data --type expenses < expenses.tsv
See es-import-tsv --help
for the full option list.
The specific cleaning here was an example using data from the calaccess-raw-data project.
- Parse timestamps into something ES will understand for any field named
*_DATE
- Treat fields as numbers if they are named
AMOUNT
or*_YTD
- For field prefixes passed in via
--names
, concatenate*_NAMF
and*_NAML
to*_NAME