This project provides a C++ program that reads a large JSON file (1GB in this example) where each record is separated by a newline.
The program splits the file into multiple smaller and larger JSON files with predefined sizes ranging from 32MB to 32GB.
- If the requested size is smaller than the JSON file, we will split it.
- If the requested size is larger than the JSON file, we will repeat the process until the corresponding size is achieved.
- Supports splitting a large JSON file into multiple smaller and larger files of varying sizes.
- Handles files with sizes of:
- 32MB
- 64MB
- 128MB
- 256MB
- 512MB
- 1GB
- 2GB
- 4GB
- 8GB
- 16GB
- 32GB
- Loops through the input file if it is smaller than the target file sizes, ensuring that each output file reaches the desired size.
- A C++11 compliant compiler (e.g., GCC, Clang, MSVC)
- CMake (for building the project if using an IDE like CLion)
Some datasets can be downloaded from https://drive.google.com/drive/folders/1KQ1DjvIWpHikOg1JgmjlSWM3aAlvq-h7?usp=sharing. For this project, use those datasets that ended in _small_records.json
.
-
Clone the repository:
git clone https://github.com/ashkanvg/scalability_json_dataset cd your-repo-name
-
Change the input
inputFile
in themain.cpp
code for your corresponding JSON file. -
Create a
build
directory and navigate into it:mkdir build cd build
-
Run CMake to generate build files:
cmake ..
-
Build the project:
make
-
Run the program with your input JSON file:
./json_splitter
If you'd rather compile the program directly using g++
, use the following command:
note: make sure to change the input inputFile
in the main.cpp
code for your corresponding JSON file.
g++ -std=c++11 -o json_splitter main.cpp
./json_splitter
The program generates output files with the following names based on the target sizes:
output_32MB.json
output_64MB.json
output_128MB.json
output_256MB.json
output_512MB.json
output_1024MB.json
output_2048MB.json
output_4096MB.json
output_8192MB.json
output_16384MB.json
output_32768MB.json
Each output file will contain JSON records until the specified size is reached. If the input file's content is smaller than the target size, the program will repeat from the beginning of the input file.
fileSizes
: A vector of file sizes (in bytes) that the program will use to create the output files.createFiles
: This function handles reading from the input file line by line, writing the content into the output files, and looping back to the beginning of the input file if needed.main
: The entry point of the program, which takes the input JSON file as an argument and callscreateFiles
.
This project is part of evaluation of https://github.com/ashkanvg/cuJSON project.
This project is licensed under the MIT License. See the LICENSE
file for details.
Feel free to reach out if you have any issues or suggestions! 😊