geek-ai/irgan

generator sample of item recommendation

tosky001 opened this issue · 2 comments

in cf_gan.py, the code between 194~198 as follows:

pn = (1 - sample_lambda) * prob
pn[pos] += sample_lambda * 1.0 / len(pos)
sample = np.random.choice(np.arange(ITEM_NUM), 2 * len(pos), p=pn)

this way weights the pos sample of each user, which is not described in paper. and whether there exists some other sample way which can get same or even better results?
in addition, in some scenario, there might be more than millions user and 10k items.So, there is any rough evaluation of training time?

These lines utilize importance sampling to put emphasis on the positive items and surely there can be other unbiased sampling ways that worth trying. In big data scenario, the training time depends on many factors including the computing resources, implementation ways and hyper-parameters. It is usual to take several days to get satisfying results in large data set like Netflix.

In terms of the sampling distribution, I got an error that the probability "pn" could not sum to one. Do you have any idea about this issue? @LantaoYu @wnzhang