[new feature] handle large dataset chunks internally

Question

[new feature] handle large dataset chunks internally

Closed this issue 8 years ago · 1 comments

Summary

It would be nice if Dataproofer came with basic built-in sql capabilities (e.g. sqlite) or other tools (e.g. split in bash) that could enable it to process large datasets in chunks of 10,000 rows or less (as per the warning displayed when a large dataset is loaded --which was a great idea btw thanks :) ) without further manual processing of the dataset by the user.

Relevant logs and/or screenshots

debian 8, 4GB RAM (virtual machine)
dataset: US population estimates at http://www.census.gov/popest/data/cities/totals/2015/SUB-EST2015.html ("All States", http://www.census.gov/popest/data/cities/totals/2015/files/SUB-EST2015_ALL.csv)

Possible fixes?

built-in sqlite where the dataset can be handled by the app (and optionally allowing user to interact with the database)
ability to make use of available tools eg split in bash to automatically split a file

Thanks :)

Answer 1 · 2016-10-30T14:29:51.000Z

This has been implemented with the latest release https://github.com/dataproofer/Dataproofer/releases/tag/v1.5.0. Please let us know if there are any additional issues in this area and I'd be happy to continue to improve our internal chunking system