PointCloudConverter batch mode generates too much data
nvoelzow opened this issue · 8 comments
when running in batch mode the converter will generate much larger output than expected, the test shown below was done with 5 files 324MB each, expecting an output of roughly around 5*324 = 1620MB but instead the output files add up to 3437MB
As mentioned in #100 when running the full set of all 83 files (27gb) the output size is larger than 350gb, when running the converter on single file the output is only 18gb
$ ./PointCloudConverter.exe -input=/D/Hotfolder/inside_split/ -importformat=las -exportformat=pcroot -gridsize=1 -output=/D/Hotfolder/test/test.pcroot -maxfiles=5
::: PointCloud Converter v1.73 :::
input = d:/hotfolder/inside_split/
Batch mode enabled (import whole folder)
Added file: d:/hotfolder/inside_split/inside_830M_20190218_0000000.las
Added file: d:/hotfolder/inside_split/inside_830M_20190218_0000001.las
Added file: d:/hotfolder/inside_split/inside_830M_20190218_0000002.las
Added file: d:/hotfolder/inside_split/inside_830M_20190218_0000003.las
Added file: d:/hotfolder/inside_split/inside_830M_20190218_0000004.las
[...]
importformat = las
exportformat = pcroot
gridsize = 1
output = d:/hotfolder/test/test.pcroot
maxfiles = 5
Found 83 files..
Reading file (0/4) : d:/hotfolder/inside_split/inside_830M_20190218_0000000.las (324.2MB)
Points: 10000000
Saving 64 tiles to folder: d:\hotfolder\test
Reading file (1/4) : d:/hotfolder/inside_split/inside_830M_20190218_0000001.las (324.2MB)
Points: 10000000
Saving 119 tiles to folder: d:\hotfolder\test
Reading file (2/4) : d:/hotfolder/inside_split/inside_830M_20190218_0000002.las (324.2MB)
Points: 10000000
Saving 141 tiles to folder: d:\hotfolder\test
Reading file (3/4) : d:/hotfolder/inside_split/inside_830M_20190218_0000003.las (324.2MB)
Points: 10000000
Saving 202 tiles to folder: d:\hotfolder\test
Reading file (4/4) : d:/hotfolder/inside_split/inside_830M_20190218_0000004.las (324.2MB)
Points: 10000000
Saving 220 tiles to folder: d:\hotfolder\test
Saving rootfile: d:\hotfolder\\test\test.pcroot
*Total points= 149.99M
Done saving v3 : d:\hotfolder\\test\test.pcroot
*Skipped 7 nodes with less than 1000 points)
Finished!
Elapsed: 00h 01m 36s 136ms
Exit
then checking the output size:
$ du -chs /D/hotfolder/test
3.4G /D/hotfolder/test
3.4G total
is the test folder empty? (converter doesn't remove old files, but overwrites existing ones if same name)
the math details are:
1.000.000 points equals 6 float values (X, Y, Z, R, G, B)
one float is 4 bytes, 4x6 = takes 24 bytes per point.
so it becomes: 1.000.000 points x 24 bytes = 24.000.000 bytes (24mb)
*with packed colors, RGB is eliminated, so it would be half size.
(adding one "+1" for compression TODO or non-float format, even Half-type should work with v3)
Yes, the folder was empty before starting the conversion.
indeed, according to your example the 50M points I used in the test described above should have resulted in ~1200MB for xyz+rgb and not 3437MB - yet this was "only" off by a factor of 2.8 (which I could have ignored as necessary overhead or for double precision etc), but the >350GB instead of 18GB when running the full dataset with 83 x 10M points can't be right...
can you check .pcroot with notepad, how many points it shows there?
point count is the 3rd value on first row
https://github.com/unitycoder/UnityPointCloudViewer/wiki/Binary-File-Format-Structure#custom-v3-tiles-pcroot-and-pct-rgb
total point count in pcroot file is 149993455
so i'm thinking maybe the batch mode itself doesn't clear arrays properly or something?
can you check in the folder, do the single files appear to grow too big?
from .pcroot, can check those rows for each file, point count is first value after filename, so check if point count * 24 matches existing file sizes in folder?
just looking at a couple of individual pct and pct.rgb files, these match the point counts listed in the pcroot file, so for example
test_2_14_4_1.pct|1464872|14|4.040509|1|15|4.598314|1.999992|0|0|0
the corresponding test_2_14_4_1.pct and test_2_14_4_1.pct.rgb are 17578464 bytes each
the overall size of the folder also matches the total count as listed in the pcroot, so 149993455 points and 3599842920 bytes in all pct and pct.rgb files together
in other words, the output in itself is functional and the pcroot matches with the pct/pct.rgb files, but there are more points in the output than in the input - so not clearing data structures between two input files and writing the same points multiple times sounds plausible
i think this was now solved, try version 1.74
https://www.dropbox.com/s/7i6ezo1yelwwl70/PointCloudConverter174.zip?dl=1