alessandrolulli/reforest

Question: How to use the most specialized data structure and data type

Closed this issue · 2 comments

Dear All,
in your paper "Crack Random Forest for Arbitrary Large Datasets" in Section IV.A you write that "The binning operation is needed both for improving the computational performance [9], [14] and for storing the dataset in-memory in an efficient manner using the most specialized data structure and data type."
I exactly need to exploit this trick, can you please put an example?
Thank you in advance for your help.
Best,
Federica

Dear Federica,
we have updated the repository for addressing the questions that you have raised.
To use the most specialised data structure build the predictor as follows
val rfRunner = ReForeStTrainerBuilder.apply(new TypeInfoDouble(), new TypeInfoByte(), property).build(sc)
I hope it helps!
--Luca

Dear Luca,
thank you very much for your prompt reply.
The solution is exactly what I was searching for.
Thank you again for your great library and work!
Federica