LibRecommender is an easy-to-use recommender system focused on end-to-end recommendation. The main features are:
- Implemented a number of popular recommendation algorithms such as FM, DIN, LightGCN etc, see full algorithm list.
- A hybrid recommender system, which allows user to use either collaborative-filtering or content-based features. New features can be added on the fly.
- Low memory usage, automatically convert categorical and multi-value categorical features to sparse representation.
- Support training for both explicit and implicit datasets, and negative sampling can be used for implicit dataset.
- Provide end-to-end workflow, i.e. data handling / preprocessing -> model training -> evaluate -> serving.
- Support cold-start prediction and recommendation.
- Provide unified and friendly API for all algorithms. Easy to retrain model with new users/items.
import numpy as np
import pandas as pd
from libreco.data import random_split, DatasetPure
from libreco.algorithms import SVDpp # pure data, algorithm SVD++
from libreco.evaluation import evaluate
data = pd.read_csv("examples/sample_data/sample_movielens_rating.dat", sep="::",
names=["user", "item", "label", "time"])
# split whole data into three folds for training, evaluating and testing
train_data, eval_data, test_data = random_split(data, multi_ratios=[0.8, 0.1, 0.1])
train_data, data_info = DatasetPure.build_trainset(train_data)
eval_data = DatasetPure.build_evalset(eval_data)
test_data = DatasetPure.build_testset(test_data)
print(data_info) # n_users: 5894, n_items: 3253, data sparsity: 0.4172 %
svdpp = SVDpp(task="rating", data_info=data_info, embed_size=16, n_epochs=3, lr=0.001,
reg=None, batch_size=256)
# monitor metrics on eval_data during training
svdpp.fit(train_data, verbose=2, eval_data=eval_data, metrics=["rmse", "mae", "r2"])
# do final evaluation on test data
print("evaluate_result: ", evaluate(model=svdpp, data=test_data,
metrics=["rmse", "mae"]))
# predict preference of user 2211 to item 110
print("prediction: ", svdpp.predict(user=2211, item=110))
# recommend 7 items for user 2211
print("recommendation: ", svdpp.recommend_user(user=2211, n_rec=7))
# cold-start prediction
print("cold prediction: ", svdpp.predict(user="ccc", item="not item",
cold_start="average"))
# cold-start recommendation
print("cold recommendation: ", svdpp.recommend_user(user="are we good?",
n_rec=7,
cold_start="popular"))
import numpy as np
import pandas as pd
from libreco.data import split_by_ratio_chrono, DatasetFeat
from libreco.algorithms import YouTubeRanking # feat data, algorithm YouTubeRanking
data = pd.read_csv("examples/sample_data/sample_movielens_merged.csv", sep=",", header=0)
data["label"] = 1 # convert to implicit data and do negative sampling afterwards
# split into train and test data based on time
train_data, test_data = split_by_ratio_chrono(data, test_size=0.2)
# specify complete columns information
sparse_col = ["sex", "occupation", "genre1", "genre2", "genre3"]
dense_col = ["age"]
user_col = ["sex", "age", "occupation"]
item_col = ["genre1", "genre2", "genre3"]
train_data, data_info = DatasetFeat.build_trainset(
train_data, user_col, item_col, sparse_col, dense_col
)
test_data = DatasetFeat.build_testset(test_data)
train_data.build_negative_samples(data_info) # sample negative items for each record
test_data.build_negative_samples(data_info)
print(data_info) # n_users: 5962, n_items: 3226, data sparsity: 0.4185 %
ytb_ranking = YouTubeRanking(task="ranking", data_info=data_info, embed_size=16,
n_epochs=3, lr=1e-4, batch_size=512, use_bn=True,
hidden_units="128,64,32")
ytb_ranking.fit(train_data, verbose=2, shuffle=True, eval_data=test_data,
metrics=["loss", "roc_auc", "precision", "recall", "map", "ndcg"])
# predict preference of user 2211 to item 110
print("prediction: ", ytb_ranking.predict(user=2211, item=110))
# recommend 7 items for user 2211
print("recommendation(id, probability): ", ytb_ranking.recommend_user(user=2211, n_rec=7))
# cold-start prediction
print("cold prediction: ", ytb_ranking.predict(user="ccc", item="not item",
cold_start="average"))
# cold-start recommendation
print("cold recommendation: ", ytb_ranking.recommend_user(user="are we good?",
n_rec=7,
cold_start="popular"))
For more examples and usages, see User Guide
JUST normal data format, each line represents a sample. One thing is important, the model assumes that user
, item
, and label
column index are 0, 1, and 2, respectively. You may wish to change the column order if that's not the case. Take for Example, the movielens-1m
dataset:
1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
Besides, if you want to use some other meta features (e.g., age, sex, category etc.), you need to tell the model which columns are [sparse_col
, dense_col
, user_col
, item_col
], which means all features must be in a same table. See above YouTubeRanking
for example.
Also note that your data should not contain missing values.
For how to serve a trained model in LibRecommender, see Serving Guide .
From pypi :
$ pip install LibRecommender
To build from source, you 'll first need Cython and Numpy:
$ # pip install numpy cython
$ git clone https://github.com/massquantity/LibRecommender.git
$ cd LibRecommender
$ python setup.py install
Basic Dependencies for libreco
:
- Python >= 3.6
- TensorFlow >= 1.15
- PyTorch >= 1.10
- Numpy >= 1.19.5
- Cython >= 0.29.0
- Pandas >= 1.0.0
- Scipy >= 1.2.1
- scikit-learn >= 0.20.0
- gensim >= 4.0.0
- tqdm
- nmslib (optional)
LibRecommender is tested under TensorFlow 1.15, 2.5, 2.8 and 2.10. If you encounter any problem during running, feel free to open an issue.
Known issue: Sometimes one may encounter errors like ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
. In this case try upgrading numpy, and version 1.22.0 or higher is probably a safe option.
The table below shows some compatible version combinations:
Python | Numpy | TensorFlow | OS |
---|---|---|---|
3.6 | 1.19.5 | 1.15, 2.5 | linux, windows, macos |
3.7 | 1.20.3, 1.21.6 | 1.15, 2.5, 2.8, 2.10 | linux, windows, macos |
3.8 | 1.22.4, 1.23.2 | 2.5, 2.8, 2.10 | linux, windows, macos |
3.9 | 1.22.4, 1.23.2 | 2.5, 2.8, 2.10 | linux, windows, macos |
3.10 | 1.22.4, 1.23.2 | 2.8, 2.10 | linux, windows, macos |
Optional Dependencies for libserving
:
- Python >= 3.7 (Many of the below libraries require at least 3.7)
- sanic >= 22.3
- requests
- aiohttp
- pydantic
- ujson
- redis
- redis-py >= 4.2.0
- faiss >= 1.5.2
- TensorFlow Serving == 2.8.2
One can also use the library in a docker container without installing dependencies, see Docker.
[1] Category:
pure
means collaborative-filtering algorithms which only use behavior data,feat
means other side-features can be included. ↩[2] Sequence: Algorithms that leverage user behavior sequence. ↩
[3] Graph: Algorithms that leverage graph information, including Graph Embedding (GE) and Graph Neural Network (GNN) . ↩
[4] Embedding: Algorithms that can generate final user and item embeddings. ↩