Add line number limit to CLI to quickly test sources
Closed this issue · 2 comments
In dipper we had a command line option to limit the number of lines processed. This is useful for quickly evaluating an ingest while developing as an alternative to using a debugger.
Commit dfd8de3 adds a row_limit
paramater to transform()
method, passed down to CSV, JSON, and JSONL Readers. Limits processing to specified number of rows per source file (ie. if you pull from 2 source files and set row_limit to 3, final edges file will have 6 entries) .
Still need to implement JSON and JSONL tests
Commit 9436f1c adds unit tests for JSON and JSONL Readers with row_limit
params included. Ideally, an integration test would be written to verify the entire chain, maybe by parameterizing the existing test_row_limit, but at least for now we can confirm that the row_limit
parameter works properly in the readers.
MR #73 should be ready for inspection.