flatironinstitute/mountainlab-js

Error in phase 2 of clustering

kathefter opened this issue · 3 comments

When I run some of my datasets, I get a repeated warning that splitting the data is generating empty parcels, and that this could be due to duplicate points. Eventually, the program crashes. I've checked and there aren't any duplicates; also, I can run this on the same data multiple times, and only sometimes get it to crash without changing anything. How would you recommend troubleshooting this? I didn't attach the entire error log, but it's a repetition of

Warning in isosplit5: new parcel has no points -- perhaps dataset contains duplicate points? -- original size = 19.
Warning: Size did not change after splitting parcel.
Warning in isosplit5: new parcel has no points -- perhaps dataset contains duplicate points? -- original size = 19.
Warning in isosplit5: new parcel has no points -- perhaps dataset contains duplicate points? -- original size = 19.

finally followed by

RangeError: Invalid string length
    at Socket.<anonymous> (/home/anaconda3/envs/ml-env/lib/node_modules/mountainlab-js/mlproc/systemprocess.js:52:24)
    at Socket.emit (events.js:180:13)
    at addChunk (_stream_readable.js:274:12)
    at readableAddChunk (_stream_readable.js:261:11)
    at Socket.Readable.push (_stream_readable.js:218:10)
    at Pipe.onread (net.js:581:20)

This can sometimes happen if the data contains a bunch of zeros (for some reason the data might get zeroed at some point during the processing).

The best way to proceed is to inspect the input into isosplit5... Are you familiar with python to be able to intercept this input and write to a file? so we can isolate the issue and inspect the input to isosplit5?

That would make sense - we were zscoring the data before running it through mountainsort. Skipping that step seems to have solved the problem. Thank you!

Glad it's working!