We use this repository to experiment various methods of matrix factorization, which could be utilized in recommender system.
In this trial, we will use web data crawled from PTT(refer to jwlin/ptt-web-crawler).
Randomly crawled 5000 articles, trying to recommend the most related articles given an article.
Different methods of contructing matrix
- item-based - ex. build features of articles
- collaborative filtering - ex. ratings of users toward different movies (in our case, there are 'push' and 'downvote' in PTT)
Different methods of Matrix Factorization
- PCA
- SVD
- ALS
We use data from PTT Gossiping Board. There are 2000 articles represented by JSON(You can download here https://drive.google.com/file/d/1IoYUJsH0KzJ3wbiyxO1AG-CVbY9O8yPB/view?usp=sharing)
To begin, let's first take a look at test/test_case_1
We simply use tf-idf to convert each article into a vector of tf-idf. Then, we can use cosine similarity(or others ex. Jaccard Similarity) to get recommendation.