/famf

Feature-aware Matrix Factorization Model

Primary LanguageJavaGNU General Public License v3.0GPL-3.0

Feature-aware Matrix Factorization Model

This is the implementation based on the following paper:

Tao Chen, Xiangnan He and Min-Yen Kan (2016). Context-aware Image Tweet Modelling and Recommendation. In Proceedings of the 24th ACM International Conference on Multimedia (MM'16), Amsterdam, The Netherlands.

We have additionally released two datasets used in our paper:

Please cite our MM'16 paper if you use our code or datasets. Thanks!

Author: Tao Chen (http://www.cs.jhu.edu/~taochen)

Usage

We implemented three feature-aware matrix factorization (FAMF) models that use different features.

Model Configuration File Features
Text text.conf Post's contextual words
Visual visual.conf Image's visual tags
TextVisual text_visual.conf The combination of contextual words and visual tags

Dataset Preparation

If you are using our dataset, please:

  • Crawl the tweets and images from Twitter
  • Extract necessary features
  • Generate training and test set

The required input files vary from one model to another. Please refer to model configuration file for details. We list the format of rating and feature file in the below:

  • Rating file format

Each line contains one positive tweet and its paired N negative tweets for a particular user. Each rating consists of four elements: user, post ID, publisher (the author of the post), and the rating (1 denotes the user has retweeted the tweet, 0 not retweeted). The negative tweets could be sampled by our time-aware negative sampling algorithm (detailed in the paper).

user_1 486447896191959040 pub_65893 1,user_1 486619477933838336 pub_18 0,user_1 486596611431477248 pub_22 0,user_1 486602569419333632 pub_21 0,user_1 486532028570275840 pub_45 0...

  • Feature file format

Each line contains the post ID followed by the feature ID, which is the index of contextual words or visual tags. Therfore, feature ID should be continuous integer [0, W), where W is the number of features for a particular type.

544555137272791040,2275 3474 36361 9123 23694 57 714 3112 1212 19505 7409 8011 18770 5878 256 3314 2039

How to run

  • Set the configuration file properly

    Please see the comments (start with #) in the configuration file for guidance. In general, you should set the dataset paths and the visual/texual vocabulary size according to your dataset, and tune the model parameters (e.g., number of factors, regularizer) to obtain optimal results. For the rest parameters, you may just the default values.

  • Complile the source code and run the model

    • If you are using Eclipse:

    Please add the the jar files in "lib" folder to project build path. See this post on how to do this. And run the respective model with its configuration file as program parameter.

    • If you are using command line:
     mkdir bin
     javac -cp "lib/*" -d bin src/data/* src/main/* src/matrix/* src/model/* src/util/*
     java -cp "lib/*":bin main.<model> conf/<model_config>
    

    Please replace <model> and <model_config> by the respective model and configuration file. E.g.,

    java -cp "lib/*":bin main.TextVisualMain conf/text_visual.conf
    

Output

The above code invokes the pipeline of training, testing and evaluation, and generates the following files:

File/Folder Description
result.csv This file contains the overall experimental results on the test set.
result_user.csv This file contains the user-level experimental results on the test set.
model This folder contains the user factor and feature factor learned in the training set.
prediction This folder contains the exact score of user and tweet pair.
config.txt The experimental configuration settings.
log.txt Log information.