monarch-initiative/koza

Add line number limit to CLI to quickly test sources

Closed this issue · 2 comments

In dipper we had a command line option to limit the number of lines processed. This is useful for quickly evaluating an ingest while developing as an alternative to using a debugger.

Commit dfd8de3 adds a row_limit paramater to transform() method, passed down to CSV, JSON, and JSONL Readers. Limits processing to specified number of rows per source file (ie. if you pull from 2 source files and set row_limit to 3, final edges file will have 6 entries) .

Still need to implement JSON and JSONL tests

Commit 9436f1c adds unit tests for JSON and JSONL Readers with row_limit params included. Ideally, an integration test would be written to verify the entire chain, maybe by parameterizing the existing test_row_limit, but at least for now we can confirm that the row_limit parameter works properly in the readers.

MR #73 should be ready for inspection.