[new feature] handle large dataset chunks internally
Closed this issue · 1 comments
mehmetaergun commented
Summary
It would be nice if Dataproofer came with basic built-in sql capabilities (e.g. sqlite) or other tools (e.g. split in bash) that could enable it to process large datasets in chunks of 10,000 rows or less (as per the warning displayed when a large dataset is loaded --which was a great idea btw thanks :) ) without further manual processing of the dataset by the user.
Relevant logs and/or screenshots
- debian 8, 4GB RAM (virtual machine)
- dataset: US population estimates at http://www.census.gov/popest/data/cities/totals/2015/SUB-EST2015.html ("All States", http://www.census.gov/popest/data/cities/totals/2015/files/SUB-EST2015_ALL.csv)
Possible fixes?
- built-in sqlite where the dataset can be handled by the app (and optionally allowing user to interact with the database)
- ability to make use of available tools eg split in bash to automatically split a file
Thanks :)
newsroomdev commented
This has been implemented with the latest release https://github.com/dataproofer/Dataproofer/releases/tag/v1.5.0. Please let us know if there are any additional issues in this area and I'd be happy to continue to improve our internal chunking system