Feature-aware Matrix Factorization Model
This is the implementation based on the following paper:
Tao Chen, Xiangnan He and Min-Yen Kan (2016). Context-aware Image Tweet Modelling and Recommendation. In Proceedings of the 24th ACM International Conference on Multimedia (MM'16), Amsterdam, The Netherlands.
We have additionally released two datasets used in our paper:
Please cite our MM'16 paper if you use our code or datasets. Thanks!
Author: Tao Chen (http://www.cs.jhu.edu/~taochen)
Usage
We implemented three feature-aware matrix factorization (FAMF) models that use different features.
Model | Configuration File | Features |
---|---|---|
Text | text.conf | Post's contextual words |
Visual | visual.conf | Image's visual tags |
TextVisual | text_visual.conf | The combination of contextual words and visual tags |
Dataset Preparation
If you are using our dataset, please:
- Crawl the tweets and images from Twitter
- Extract necessary features
- Generate training and test set
The required input files vary from one model to another. Please refer to model configuration file for details. We list the format of rating and feature file in the below:
- Rating file format
Each line contains one positive tweet and its paired N negative tweets for a particular user. Each rating consists of four elements: user, post ID, publisher (the author of the post), and the rating (1 denotes the user has retweeted the tweet, 0 not retweeted). The negative tweets could be sampled by our time-aware negative sampling algorithm (detailed in the paper).
user_1 486447896191959040 pub_65893 1,user_1 486619477933838336 pub_18 0,user_1 486596611431477248 pub_22 0,user_1 486602569419333632 pub_21 0,user_1 486532028570275840 pub_45 0...
- Feature file format
Each line contains the post ID followed by the feature ID, which is the index of contextual words or visual tags. Therfore, feature ID should be continuous integer [0, W), where W is the number of features for a particular type.
544555137272791040,2275 3474 36361 9123 23694 57 714 3112 1212 19505 7409 8011 18770 5878 256 3314 2039
How to run
-
Set the configuration file properly
Please see the comments (start with #) in the configuration file for guidance. In general, you should set the dataset paths and the visual/texual vocabulary size according to your dataset, and tune the model parameters (e.g., number of factors, regularizer) to obtain optimal results. For the rest parameters, you may just the default values.
-
Complile the source code and run the model
- If you are using Eclipse:
Please add the the jar files in "lib" folder to project build path. See this post on how to do this. And run the respective model with its configuration file as program parameter.
- If you are using command line:
mkdir bin javac -cp "lib/*" -d bin src/data/* src/main/* src/matrix/* src/model/* src/util/* java -cp "lib/*":bin main.<model> conf/<model_config>
Please replace <model> and <model_config> by the respective model and configuration file. E.g.,
java -cp "lib/*":bin main.TextVisualMain conf/text_visual.conf
Output
The above code invokes the pipeline of training, testing and evaluation, and generates the following files:
File/Folder | Description |
---|---|
result.csv | This file contains the overall experimental results on the test set. |
result_user.csv | This file contains the user-level experimental results on the test set. |
model | This folder contains the user factor and feature factor learned in the training set. |
prediction | This folder contains the exact score of user and tweet pair. |
config.txt | The experimental configuration settings. |
log.txt | Log information. |