Generates histogram from data in Tab Separated Value (TSV) format. Data processing is parallelized using MPI.
When you need to make histograms out of data in one column in a large data file in TSV format.
OpenMPI
- Navigate to the directory
- Compile using
make
- Run
mpirun -np 1 ./tsv_hist.out <column> <min-value> <max-value> <num-bins> <TSVfile>
- specifies the column in the TSV file which contains data for histogram
- and specifies the range of the data
- is the number of bins required
- is the path to the data file in TSV format
When tested with data files of size (300MB to 32GB), the time taken to generate histogram was: