Memory efficiency of RECOM_RANDOM (and, possibly, RECOM_POPULAR)
Opened this issue · 7 comments
The current implementation of RECOM_RANDOM will use excessive amounts of memory for predicting top-N-lists if many items exists. This is because a dense matrix is created for all (new) users and items, and is passed to returnRatings()
as a whole. If the resulting object is sparse (like, e.g., top-N-lists), this is inefficient from a memory usage point of view (because most of the non-NA entries are thrown away anyway, but it is ex ante unclear how many).
I have created a variant in my fork (gregreich/recommenderlab@85f3e62) which loops over (new) users, passing each row to returnRatings()
individually (to be consistent with the other RECOMs's return values), and stacking the results on top of each other (via a dgCMatrix
). This approach costs runtime (a factor of 4 to 5, see below), but seems to have constant memory usage (n
is number of new users here; number of items is about 250k).
Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
1 predict [old, n=20] 3.495 971.4
2 predict [new, n=20] 17.502 750.8
3 predict [old, n=100] 18.185 1839.6
4 predict [new, n=100] 95.057 751.4
Maybe you can have a look and comment on it. Runtime efficiency should definitely improved; also, maybe the conversion from and to dgCMatrix
could be avoided, but I do not know how to concatenate to objects of type realRatingMatrix
directly.
A similar problem seems to exist for RECOM_POPULAR, but I haven't attempted that one yet.
Update: I have implemented the same procedure for RECOM_POPULAR.
Moreover, I made some improvements, and the runtime penalty has halved since. So it is now between 2-3 times slower than the original version, but still has constant memory. See gregreich/recommenderlab@e208fb5
This is an issue with all recommenders that create a full or close to full rating matrix before returnRatings()
translates them into a topN list. For the random recommender, we could implement a more efficient sampling code that does not create the full matrix (I was too lazy), but in general I think it would be better to create a wrapper in
setMethod("predict", signature(object = "Recommender")
in predict.R
that applies a blocking strategy for all recommenders where this could be an issue. We could define a maximum block size (rows x items) and then call predict dor the actual recommender for each block.
The main motivation behind my approach to loop over users and still use returnRatings()
was to be as little intrusive as possible, and to ensure consistency of the output arguments without having to think (or to test) too much; so in that sense, I have to admit, I was lazy, too.
The blocking would definitely be a nice feature; but until somebody has time to do it, using the loop over users is probably quite close already, as it constitutes a special case of the blocker. (And we also have to keep in mind that maximum speedup compared to the other special case, i.e., one single block, is bounded quite tightly - if my measurements are reliable.)
Hi Michael, thank you for all the updates!
I also have two more comments:
- The solution using c() on a user level is indeed not optimal for two reasons: First, the "user" in this context can well be recommenderlab itself (e.g., in an evaluationScheme), so it would still require intrusive changes; as you suggested, a wrapper could be a solution. However, and secondly, this procedure potentially creates a source of substantial overhead for those RECOMs that can benefit from knowing all data at once, like AR due to the call to
is.subset()
; I've seen that in realistic cases. So having a solution that can be "appended" to the specific steps of the RECOM would be desirable, I think. removeKnownRatings()
has a similar problem (the code contains aFIXME
already). For example, calling it from RECOM_AR where everything is nice and sparse (provided the problem itself is), it will potentially use enormous amounts of memory just for cleaning out the known items. I have applied my loop quick fix here, too, and it appears that the overhead in terms of runtime is moderate (below 2, at least for sufficiently large cases - I haven't analysed smaller ones yet), but the memory requirement is now more or less constant as opposed to linear in the number of new users.
Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
1 predict [old, n=100 ] 90.865 1077.3
2 predict [new, n=100 ] 152.973 638.3
3 predict [old, n=1000] 882.417 7595.8
4 predict [new, n=1000] 1318.017 856.6
In case you are interested, you find the version in my memory efficiency branch; of course, this version is not fully sparse but still dense in terms of items.
removeKnownRatings is now sparse.
I am still thinking about the best way to deal with creating topN lists more memory efficient for RANDOM and POPULAR. I want to keep the code simple since this is one of the main features of this package. To make this really efficient, we probably need to implement these parts in C/C++.
Hi, thanks for the new version of removeKnownRatings()
, it works very nicely.
Meanwhile, I have adapted also HybridRecommender to not overuse memory when creating top-N lists. Here, the problem is twofold: First, the predictions from the individual RECOMs have been dense (because predict()
was called with type="ratings"
, regardless what the final return type of the hybrid is), and second, again a complete, dense rating matrix is created to work on; in this case, I think it does not make sense because there is a loop over the rows of newdata
anyway. I resolved both issues in a quick and dirty manner (see here: cb2361d), but this time, it turns out to be even quicker than the original version, and, at the same time, less memory consuming:
Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
1 predict [old, n=100] 241.613 3345.9
2 predict [new, n=100] 202.106 1053.5
I think that there is only so much one can do if the return type for any prediction is dense by construction of the desired output. But working with top-N lists wherever possible also in the intermediate steps would be highly desirable, I take from this.