practical-recommender-systems/moviegeek

Explicit zeros get ignored when calculating the overlap matrix.

Opened this issue · 0 comments

In the following line, an overlap matrix is created by converting the coo matrix to boolean, then to integer.

overlap_matrix = coo.astype(bool).astype(int).dot(coo.transpose().astype(bool).astype(int))

However, what this does is that it converts the ratings which are normalized to zero, to false values, which then get ignored in the count.
My proposed solution: create a matrix with ones for every value of the coo matrix:

Example:

print("Coo matrix:\n", coo)
print("coo as bool:\n",coo.astype(bool).astype(int))
ones_data = [1] * len(coo.data)
ones_matrix = coo_matrix((ones_data, (coo.row, coo.col)), shape=coo.shape)
print("ones matrix:\n",ones_matrix)

Output:

Coo matrix:
(0, 0) -0.6666666666666667
(1, 0) 0.33333333333333326
(2, 0) 0.33333333333333326
(1, 1) 0.5
(2, 1) 0.0
(3, 1) -0.5
(1, 2) 0.0
(2, 2) 0.5
(3, 2) -0.5
coo as bool:
(0, 0) 1
(1, 0) 1
(1, 1) 1
(1, 2) 0
(2, 0) 1
(2, 1) 0
(2, 2) 1
(3, 1) 1
(3, 2) 1
ones matrix:
(0, 0) 1
(1, 0) 1
(2, 0) 1
(1, 1) 1
(2, 1) 1
(3, 1) 1
(1, 2) 1
(2, 2) 1
(3, 2) 1