/xunta-recommend

百度寻Ta项目推荐算法

Primary LanguagePHPApache License 2.0Apache-2.0

xunta-recommend

百度寻Ta项目推荐算法

可用数据有以下几个,一是当前用户的数据与一些辅助工具,在$G变量中 $G数组的值格式大致如下:

[mysqli] => MySQLiObject
[uid] => 25
    [sex] => 0
    [today] => 2018-04-03
    [now] => 2018-04-03 13:14:43
    [email_prefix] => xxx
    [user] => Array
        (
            [uid] => 25
            [email] => xxx@xxx.com
            [real_name] => xxx
            [email_prefix] => xxx
            [sex] => 0
            [photo_id] => 1643
            [description] =>
            [nickname] => 云
            [birthday] => 1990-08-11
            [age] => 27
            [verify] => 1
            [disabled] => 0
            [receive_gsm] => 1
            [receive_mail] => 1
        )
    [real_name] => xxx

另有几个表(或视图): 数据信息表有以下几个: user表,存储用户基本信息。

CREATE TABLE `user` (
  `uid` int(11) NOT NULL,
  `email` varchar(40) COLLATE utf8_bin NOT NULL,
  `nickname` varchar(40) COLLATE utf8_bin NOT NULL,
  `sex` tinyint(3) NOT NULL,
  `photo_id` int(11) NOT NULL,
  `birthday` date NOT NULL,
  `register_time` datetime NOT NULL,
  `last_login_time` datetime NOT NULL,
  `description` varchar(500) COLLATE utf8_bin NOT NULL,
  `real_name` varchar(20) COLLATE utf8_bin DEFAULT NULL,
  `email_prefix` varchar(40) COLLATE utf8_bin DEFAULT NULL,
  `verify` tinyint(4) DEFAULT NULL,
  `phone_num` varchar(20) COLLATE utf8_bin DEFAULT NULL,
  `disabled` tinyint(4) DEFAULT '0',
  `area` varchar(40) COLLATE utf8_bin DEFAULT NULL,
  `job_type` varchar(40) COLLATE utf8_bin DEFAULT NULL,
  `receive_mail` tinyint(4) DEFAULT '1',
  `receive_gsm` tinyint(4) DEFAULT '1'
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

recommend表,存储给用户推荐了用户的信息

CREATE TABLE `recommend` (
  `rid` int(11) NOT NULL,
  `for_uid` int(11) NOT NULL,
  `target_uid` int(11) NOT NULL,
  `voted` tinyint(1) DEFAULT NULL,
  `seen_time` datetime DEFAULT NULL,
  `date` date NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

matching表存储用户选择信息,action为1表示喜欢,action为2表示不喜欢

CREATE TABLE `matching` (
  `id` int(11) NOT NULL,
  `uid` int(11) NOT NULL,
  `target_uid` int(11) NOT NULL,
  `action` tinyint(3) NOT NULL,
  `ts` datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

matched表表示已经匹配的用户信息。 例如如果表中有两条数据 uid=1, uid2=2, seen=1, seen_time='2018-04-03 01:02:03' uid=2, uid2=1, seen=0, seen_time= NULL 表示 用户1与用户2match了,并且用户1已经看到了他们match了,而用户2并未看到。

CREATE TABLE `matched` (
  `id` int(11) NOT NULL,
  `uid` int(11) NOT NULL,
  `uid2` int(11) NOT NULL,
  `seen` tinyint(1) NOT NULL,
  `seen_time` datetime DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

photo表是用户的照片信息:

    CREATE TABLE `photo` (
      `photo_id` int(11) NOT NULL,
      `url` varchar(300) COLLATE utf8_bin NOT NULL,
      `upload_time` datetime NOT NULL,
      `uid` int(10) DEFAULT NULL,
      `verify` tinyint(4) DEFAULT NULL,
      `deleted` tinyint(4) DEFAULT NULL
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

        另提供了一个简单的辅助函数

    function result_to_array($result) {
        $ret = array();
        while ($row = $result->fetch_assoc()) {
            $ret[] = $row;
        }
        return $ret;
    }

下边是初始的推荐算法。

    function make_recommend() {
        global $G;
        $uid = $G['uid'];
        $sex = $G['sex'];
        $num = 1;

        if (rand(1, 100) < 40) {
            // 40%的机率推荐已经选了你的人(如果有的话)
            // 如果某人之前某天已经看到过另一个用户,即使未投票,将来也不应再被高几率选中,否则就暴露了用户隐私,
            // 但可以重新被普通几率选中。

            // 现在都是需要时再进行实时的推荐,所以做出推荐和展现是同时的,故seen_time一定非空。
            // 但是未来,可能会增加非实时的推荐,做出推荐的时候用户可能并未触发查看推荐结果的行为,所以seen_time可能为空。
            // 在这个判断里,加上了seen_time is not null的判断,以支持未来非实时的推荐
            $sql = "select uid, photo_id, description, nickname, (year(now())-year(birthday)-1) + ( DATE_FORMAT(birthday, '%m%d') <= DATE_FORMAT(NOW(), '%m%d') ) as age from user where uid <> '$uid' and sex <> $sex
                and uid in (select uid from matching where target_uid='$uid' and action=1)
                and uid not in (select target_uid from recommend where for_uid='$uid' and seen_time is not null)
                and verify = 1
                and disabled = 0
            order by rand()
            limit $num";

            $result = $G['mysqli']->query($sql) or die($G['mysqli']->error);
            if ($result->num_rows != 0) return result_to_array($result);
        }

        // FIXME (zhangyuncong):
        //     "order by rand()" will be extremely slow when the table becomes large.
        $sql = "select uid, photo_id, description, nickname, (year(now())-year(birthday)-1) + ( DATE_FORMAT(birthday, '%m%d') <= DATE_FORMAT(NOW(), '%m%d') ) as age from user where uid <> '$uid' and sex <> $sex
                    and uid not in (select target_uid from matching where uid='$uid')
                    and verify = 1
                    and disabled = 0
                order by rand()
                limit $num";

        $result = $G['mysqli']->query($sql) or die($G['mysqli']->error);
        return result_to_array($result);
    }

如何测试

你需要先使用example.sql创建表,并手工添加一些初始数据 TODO: 弄一个标准的测试数据集 然后,打开test_recommend.php 修改$G数组的初始值,主要是数据库用户名密码与数据库名必须修改 最后,调用make_recommend得出推荐结果.       注意上传github时避免把你数据库账号密码传上来。