ALS get_rmse 函数计算问题

Question

ALS get_rmse 函数计算问题

Closed this issue 4 years ago · 2 comments

n_elements = sum(map(len, ratings.values()))
elements 数目偏大，在你的矩阵计算过程中被赋予的0 的默认值
导致ratings 的规模是m*n ,使得RMSE 的值偏小

Answer 1 · 2019-05-25T13:37:01.000Z

您好，KuanSun1024，抱歉，由于工作繁忙没有及时回复。我会抽空修改一下，感谢指正：）在 2019-05-22 17:46:33，"KuanSun1024" <notifications@github.com> 写道： n_elements = sum(map(len, ratings.values())) elements 数目偏大，在你的矩阵计算过程中被赋予的0 的默认值导致ratings 的规模是m*n ,使得RMSE 的值偏小 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

Answer 2 · 2020-11-17T02:55:21.000Z

我是单独提出来部分测试数据，然后重写了_get_rmse()函数。
测试数据放在__init__里：
self.test_data = test_data
self.test_dict = []
for data in self.test_data:
if (data[0], data[1]) not in self.test_dict:
self.test_dict.append((data[0], data[1]))
然后在训练函数里main()加入测试数据：
def main():
print("Tesing the performance of ALS...")
# Load data
X, n_elements = load_movie_ratings()
test_data = random.sample(X, int(0.2 * len(X)))
# Train model
model = ALS(test_data)

然后是计算rmse的函数：
def _get_rmse(self):
"""Calculate RMSE.
Arguments:
ratings {dict} -- The items ratings by users.
Returns:
float
"""
labels = []
predictions = []
for data in self.test_data:
user_id, item_id, rating = data[0], data[1], data[2]
labels.append(rating)
i = self.user_ids_dict[user_id]
j = self.item_ids_dict[item_id]
user_row = self.user_matrix.col(i).transpose
item_col = self.item_matrix.col(j)
rating_hat = user_row.mat_mul(item_col).data[0][0]
predictions.append(rating_hat)
return self.get_rmse(labels, predictions)

def get_rmse(self, pred, y):
    return np.sqrt(mean_squared_error(pred, y))

相当于训练是全量数据，随机选择了部分测试数据进行计算。这样是解决里原代码测试样本数远远大于实际的数量的错误。
仅供参考，欢迎指正。