Demonstrate > 50 MB limits storage on GitHub using PyArrow
usage: divider.py [-h] -i INPUT [-s SIZE] -o OUTPUT [-v | --verbose | --no-verbose]
Take a file and divide it into partitions of specific sizes
options:
-h, --help show this help message and exit
-i INPUT, --input INPUT
The large file to be partitioned
-s SIZE, --size SIZE Maximum size of the partitioned file in MB
-o OUTPUT, --output OUTPUT
-v, --verbose, --no-verbose
To generate the files in this repository, I did:
python divider.py -i output_2019_q1.parquet -o ookla-dataset
This project was built as part of the 2022 Data Science for the Public Good (DSPG) internship program