henckr/distRforest

Memory problem?

Closed this issue · 2 comments

Hi!

Let me say that I think that this package is incredible and it helps me a lot! However, when I try to run rforest on my dataset (which is not very much bigger that your dataset in book example - 300 k records), the session is aborted and terminated. When I used rforest before from other packages I had similiar problem. I thought that this is caused by the dissimilarity matrix which is created between each of the records from database, but somehow in your example it does not cause a problem. Do you have any idea how it could be fixed? I

Hi, thank you for your interest in distRforest! Have you tried setting red_mem = TRUE in the call to the rforest() function? This will reduce the memory footprint of the individual trees in the forest.

Hi! Thanks for quick answer. Actually I was using red_mem = TRUE. Surprisingly it started to work now by itself. The only change which I made was to upgrade by notebook to 32GB RAM which is strange, because I was trying to fit the model even for 10 thousand records and it haven't work anyway. Anyway again, it works now which is good.