git clone git@github.com:mjhm/csv-translate.git
cd csv-translate
yarn
Then you can run
yarn sample
This runs the sample
command line in the package.json
file.
Tests can be run with yarn test
.
cli.js
-- Command Line wrapper aroundindex.js
index.js
-- Main entry point to code and wrapssrc/translator.js
src/
translator.js
-- Organizes the translation process with the parser, reading the config, running the translation, and output.parser.js
-- wraps Papa Parse CSV reader.template-compiler.js
-- wraps Handlebars which is used for defining and executing the translationsdata-validators.js
-- validation functions for typed data columnsoutput/csv.js
-- wraps Papa Parse CSV writer.
The program takes a JSON config file that defines how the incoming columns are translated into outgoing columns. It leverages the 'handlebars.js' library for injecting the incoming values into template strings. Handlebars is well documented, extensible, and mature (though perhaps overkill) for this application.
The parsing of the incoming JSON is via PapaParse. This has lots of flexibility of configuration for a variety of CSV conventions. The program allows for setting these options in the config file. PapaParse is used in a streaming mode, which should work well for large files or streaming data. As a consequence of streaming mode the lines are processed one at a time through the processLine
callback. This ultimately does the data translation, validation, and output of each line.
See test/orders.config.json
for example and details. See Handlbars documentation for details on the template syntax and how to create helper functions.
- The only types that are validated are those mentioned in the sample: Integer, DateTime, BigDecimal, String. These are all assumed to be represented in both the input and output as strings, but are parsable into their respective types. In particular DateTime is represented as an ISO formatted DateTime string.
- I tested this for up to 10000 input lines, which takes about 1 second to run, I assume that this is acceeptable for now.
- Instead of Handlebars I did my initial implementation with
lodash
templates. I switched to Handlebars becauselodash
templates expose the program to script injection via the config file. I assume the scenario in mind would ideally want a client to be able to define the config file, so exposing that vulnerablity is probably bad. The downside to this is that Handlebars helpers need to be built into the code base rather than allowing code level customization in the config file.
Most of the next steps depend on product priorities and specific applications of the code. My best guesses for next steps:
- It would be that it would be handy to have a SQL output option.
- Add additonal type validations as needed.
- Review details of the PapaParse input and output config to see if that matches the real world expectations.
- Convert to TypeScript.
- More testing for real world situations.