/recsys_faiss

一个基于 fasttext + faiss 的商品内容相关推荐实现,nginx+uwsgi+flask / gunicorn+uvicorn+fastapi 提供api查询接口,增加Spark实现 Ansj+Word2vec+LSH+Phoenix

Primary LanguagePython

recsys_faiss

一个基于 fasttext + faiss 的商品内容相关推荐接口实现,restful接口采用nginx+uwsgi+flask,gunicorn+uvicorn+fastapi

增加Spark实现内容相关推荐,Ansj+Word2vec+LSH+Phoenix

商品详情页效果图

将模型部署应用

模型接口流程图

训练商品属性的特征向量,商品向量add到faiss

python embedding_recsys.py

flask封装faiss接口,输入商品id重建向量,进行余弦相似度检索

启动uwsgi

uwsgi uwsgi.ini

nginx配置

server {
    listen 8089; # 指定监听的端口
    charset utf-8;

    server_name localhost; # ip地址
    location / {
        include      uwsgi_params;
        uwsgi_pass   127.0.0.1:8088;
        uwsgi_param UWSGI_CHDIR /Users/PycharmProjects/recsys_faiss;
        uwsgi_param UWSGI_SCRIPT recsys_faiss.faiss_api.py;
        }
    }

接口测试

get请求,请求参数spu商品ID,n_items召回相似商品数量

python
>>> import requests
>>> res = requests.get("http://127.0.0.1:8089/faiss/similar_items/?spu=3&n_items=10")
>>> res.json()
{'code': '200', 'msg': '处理成功', 'result': {'56482': 1.0, '92237': 1.0, '56483': 1.0, '56481': 1.0, '56484': 1.0, '56485': 1.0, '56486': 1.0, '4': 1.0, '18': 0.9981815814971924, '19': 0.9981815814971924}}

推荐结果验证

spu = 3

+-------------+--------------------------------------+
| ITEM_NUM_ID | ITEM_NAME                            |
+-------------+--------------------------------------+
|           3 | 卓德优格乳杏口味含乳饮品             |
+-------------+--------------------------------------+

推荐结果

+-------------+---------------------------------------------------------------------------+
| ITEM_NUM_ID | ITEM_NAME                                                                 |
+-------------+---------------------------------------------------------------------------+
|          19 | 卓德低脂热处理风味发酵乳(森林水果口味)120g                              |
|        8221 | 爱乐薇蓝莓味含乳饮品125克                                                 |
|       56481 | 卓德风味发酵乳(草莓鲜酪口味)120g                                        |
|           8 | 卓德脱脂含乳饮品(覆盆子口味)                                            |
|       56483 | 卓德风味发酵乳(香草口味)120g                                            |
|          20 | 卓德低脂热处理风味发酵乳(草莓口味)120g                                  |
|       56484 | 卓德脱脂含乳饮品水蜜桃口味+覆盆子口味4*115g                               |
|       56486 | 卓德热处理风味发酵乳(原味)4*115g                                        |
|       56482 | 卓德风味发酵乳(焗苹果口味)120g                                          |
|        8229 | 爱乐薇菠萝味含乳饮品125克                                                 |
|          18 | 卓德低脂热处理风味发酵乳(水蜜桃、西番莲口味)120g                        |
|           4 | 卓德优格乳草莓口味含乳饮品                                                |
|       92237 | 卓德含乳饮品(草莓口味)460克(4*115克)                                  |
|           6 | 卓德脱脂含乳饮品(水蜜桃口味)                                            |
+-------------+---------------------------------------------------------------------------+

接口压力测试

siege -c 100 -t 10s -b "http://127.0.0.1:8089/faiss/similar_items/?spu=3&n_items=50"

Transactions:		       41011 hits
Availability:		      100.00 %
Elapsed time:		        9.17 secs
Data transferred:	       12.24 MB
Response time:		        0.02 secs
Transaction rate:	     4472.30 trans/sec
Throughput:		        1.33 MB/sec
Concurrency:		       99.57
Successful transactions:       41011
Failed transactions:	           0
Longest transaction:	        0.07
Shortest transaction:	        0.00

fastapi

gunicorn faiss_fastapi:app -w 4 -k uvicorn.workers.UvicornWorker -D
python
>>> import requests
>>> res = requests.get("http://127.0.0.1:8000/faiss/similar_items/?spu_id=3&n_items=10")
>>> res.json()
{'code': 200, 'msg': 'success', 'res': [4, 56486, 92237, 56484, 56485, 56481, 56482, 56483, 18, 20]}

Spark实现

在phoenix创建表RECSYS_SIMILAR_LSH

0: jdbc:phoenix:> create table RECSYS_SIMILAR_LSH (id varchar not null primary key, recommend varchar) salt_buckets=8;

提交spark任务

bash submit.bash

查看结果

0: jdbc:phoenix:> select * from RECSYS_SIMILAR_LSH limit 5;
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------+
|   ID    |                                                                                                                                               |
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| 100880  | 4407,78608,753,88585,99289,17360,42159,43082,8636,43403,109828,2409,214619,202489,43125,14123,97192,9408,73847,48269,20587,209262,76913,78394 |
| 102431  | 100034,100280,98687,118912,114140,29619,106257,118940,100065,30217,49843,49891,41759,28874,109745,29915,20059,29191,238333,90415,51839,48266, |
| 104213  | 237497,21255,12543,98798,90771,117289,21262,20042,75753,212108,29915,50095,50537,39070,20059,101172,53475,18816,29859,109745,41840,29619,1886 |
| 105577  | 9681,91428,62392,41219,117776,13191,120160,97337,112055,78196,202915,202899,227439,39411,94532,102624,102618,235521,105425,120167,58650,85126 |
| 106655  | 233605,42025,120616,59829,203421,209948,99844,94505,752,39665,93387,80632,232698,57406,102814,43438,42975,8926,91368,73961,210979,92327,94477 |
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------+