Memory efficiency of RECOM_RANDOM (and, possibly, RECOM_POPULAR)

Question

Memory efficiency of RECOM_RANDOM (and, possibly, RECOM_POPULAR)

Opened this issue 3 years ago · 7 comments

The current implementation of RECOM_RANDOM will use excessive amounts of memory for predicting top-N-lists if many items exists. This is because a dense matrix is created for all (new) users and items, and is passed to returnRatings() as a whole. If the resulting object is sparse (like, e.g., top-N-lists), this is inefficient from a memory usage point of view (because most of the non-NA entries are thrown away anyway, but it is ex ante unclear how many).

I have created a variant in my fork (gregreich/recommenderlab@85f3e62) which loops over (new) users, passing each row to returnRatings() individually (to be consistent with the other RECOMs's return values), and stacking the results on top of each other (via a dgCMatrix). This approach costs runtime (a factor of 4 to 5, see below), but seems to have constant memory usage (n is number of new users here; number of items is about 250k).

         Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
1 predict [old, n=20]             3.495                                971.4
2 predict [new, n=20]            17.502                                750.8
3 predict [old, n=100]           18.185                               1839.6
4 predict [new, n=100]           95.057                                751.4

Maybe you can have a look and comment on it. Runtime efficiency should definitely improved; also, maybe the conversion from and to dgCMatrix could be avoided, but I do not know how to concatenate to objects of type realRatingMatrix directly.

A similar problem seems to exist for RECOM_POPULAR, but I haven't attempted that one yet.

Answer 1 · 2021-08-27T09:41:18.000Z

Update: I have implemented the same procedure for RECOM_POPULAR.

Moreover, I made some improvements, and the runtime penalty has halved since. So it is now between 2-3 times slower than the original version, but still has constant memory. See gregreich/recommenderlab@e208fb5

Answer 2 · 2021-08-27T15:03:21.000Z

This is an issue with all recommenders that create a full or close to full rating matrix before returnRatings() translates them into a topN list. For the random recommender, we could implement a more efficient sampling code that does not create the full matrix (I was too lazy), but in general I think it would be better to create a wrapper in

setMethod("predict", signature(object = "Recommender") in predict.R

that applies a blocking strategy for all recommenders where this could be an issue. We could define a maximum block size (rows x items) and then call predict dor the actual recommender for each block.

Answer 3 · 2021-08-27T15:19:22.000Z

The main motivation behind my approach to loop over users and still use returnRatings() was to be as little intrusive as possible, and to ensure consistency of the output arguments without having to think (or to test) too much; so in that sense, I have to admit, I was lazy, too.

The blocking would definitely be a nice feature; but until somebody has time to do it, using the loop over users is probably quite close already, as it constitutes a special case of the blocker. (And we also have to keep in mind that maximum speedup compared to the other special case, i.e., one single block, is bounded quite tightly - if my measurements are reliable.)

Answer 4 · 2021-08-27T21:38:13.000Z

Hmmm. After thinking about it a little more, I think that we need a more general solution. This basically impacts any recommender that creates an (almost) complete rating matrix and that means most of the algorithms. For now, I think it is better if the user calls predict on a smaller number of new users in this case. I have added c() to combine topNLists to make this easier.

…

-m

On Fri, Aug 27, 2021 at 10:19 AM gregreich ***@***.***> wrote: The main motivation behind my approach to loop over users and still use returnRatings() was to be as little intrusive as possible, and to ensure consistency of the output arguments without having to think (or to test) too much; so in that sense, I have to admit, I was lazy, too. The blocking would definitely be a nice feature; but until somebody has time to do it, using the loop over users is probably quite close already, as it constitutes a special case of the blocker. (And we also have to keep in mind that maximum speedup compared to the other special case, i.e., one single block, is bounded quite tightly - if my measurements are reliable.) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#50 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADRV247KBPSGS3723YVI4N3T66UIJANCNFSM5C3IRFMA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

Answer 5 · 2021-08-29T14:10:29.000Z

Hi Michael, thank you for all the updates!

I also have two more comments:

The solution using c() on a user level is indeed not optimal for two reasons: First, the "user" in this context can well be recommenderlab itself (e.g., in an evaluationScheme), so it would still require intrusive changes; as you suggested, a wrapper could be a solution. However, and secondly, this procedure potentially creates a source of substantial overhead for those RECOMs that can benefit from knowing all data at once, like AR due to the call to is.subset(); I've seen that in realistic cases. So having a solution that can be "appended" to the specific steps of the RECOM would be desirable, I think.
removeKnownRatings() has a similar problem (the code contains a FIXME already). For example, calling it from RECOM_AR where everything is nice and sparse (provided the problem itself is), it will potentially use enormous amounts of memory just for cleaning out the known items. I have applied my loop quick fix here, too, and it appears that the overhead in terms of runtime is moderate (below 2, at least for sufficiently large cases - I haven't analysed smaller ones yet), but the memory requirement is now more or less constant as opposed to linear in the number of new users.

          Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
1 predict [old, n=100 ]           90.865                               1077.3
2 predict [new, n=100 ]          152.973                                638.3
3 predict [old, n=1000]          882.417                               7595.8
4 predict [new, n=1000]         1318.017                                856.6

In case you are interested, you find the version in my memory efficiency branch; of course, this version is not fully sparse but still dense in terms of items.

Answer 6 · 2021-08-31T19:38:43.000Z

removeKnownRatings is now sparse.

I am still thinking about the best way to deal with creating topN lists more memory efficient for RANDOM and POPULAR. I want to keep the code simple since this is one of the main features of this package. To make this really efficient, we probably need to implement these parts in C/C++.

Answer 7 · 2021-09-03T14:17:46.000Z

Hi, thanks for the new version of removeKnownRatings(), it works very nicely.

Meanwhile, I have adapted also HybridRecommender to not overuse memory when creating top-N lists. Here, the problem is twofold: First, the predictions from the individual RECOMs have been dense (because predict() was called with type="ratings", regardless what the final return type of the hybrid is), and second, again a complete, dense rating matrix is created to work on; in this case, I think it does not make sense because there is a loop over the rows of newdata anyway. I resolved both issues in a quick and dirty manner (see here: cb2361d), but this time, it turns out to be even quicker than the original version, and, at the same time, less memory consuming:

         Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
1 predict [old, n=100]          241.613                               3345.9
2 predict [new, n=100]          202.106                               1053.5

I think that there is only so much one can do if the return type for any prediction is dense by construction of the desired output. But working with top-N lists wherever possible also in the intermediate steps would be highly desirable, I take from this.