tushushu/imylu

ALS get_rmse 函数计算问题

Closed this issue · 2 comments

n_elements = sum(map(len, ratings.values()))
elements 数目偏大,在你的矩阵计算过程中被赋予的0 的默认值
导致ratings 的规模是m*n ,使得RMSE 的值偏小

我是单独提出来部分测试数据,然后重写了_get_rmse()函数。
测试数据放在__init__里:
self.test_data = test_data
self.test_dict = []
for data in self.test_data:
if (data[0], data[1]) not in self.test_dict:
self.test_dict.append((data[0], data[1]))
然后在训练函数里main()加入测试数据:
def main():
print("Tesing the performance of ALS...")
# Load data
X, n_elements = load_movie_ratings()
test_data = random.sample(X, int(0.2 * len(X)))
# Train model
model = ALS(test_data)

然后是计算rmse的函数:
def _get_rmse(self):
"""Calculate RMSE.
Arguments:
ratings {dict} -- The items ratings by users.
Returns:
float
"""
labels = []
predictions = []
for data in self.test_data:
user_id, item_id, rating = data[0], data[1], data[2]
labels.append(rating)
i = self.user_ids_dict[user_id]
j = self.item_ids_dict[item_id]
user_row = self.user_matrix.col(i).transpose
item_col = self.item_matrix.col(j)
rating_hat = user_row.mat_mul(item_col).data[0][0]
predictions.append(rating_hat)
return self.get_rmse(labels, predictions)

def get_rmse(self, pred, y):
    return np.sqrt(mean_squared_error(pred, y))

相当于训练是全量数据,随机选择了部分测试数据进行计算。这样是解决里原代码测试样本数远远大于实际的数量的错误。
仅供参考,欢迎指正。