45 游戏玩法分析 V
Opened this issue · 0 comments
astak16 commented
题目
题目链接:游戏玩法分析 V
玩家的安装日期 event_date
定义为该玩家的第一个登录日。
玩家的 第一天留存率
定义为:假定安装日期为 X
的玩家的数量为 N
,其中在 X
之后的一天重新登录的玩家数量为 M
, M/N
就是第一天留存率,四舍五入到小数点后两位。
编写一个 SQL 查询,报告所有安装日期、当天安装游戏的玩家数量和玩家的第一天留存率。
create table activity (
player_id int,
device_id int,
event_date date,
games_played int
);
insert into activity (player_id, device_id, event_date, games_played) values
('1', '2', '2016-03-01', '5'),
('1', '2', '2016-03-02', '6'),
('2', '3', '2017-06-25', '1'),
('3', '1', '2016-03-01', '0'),
('3', '4', '2018-07-03', '5');
分析
需要几个值
- 每个玩家第一个登录日
- 第一个登陆日之后有没有登录
- 第一个登录日的玩家的数量,第一个登录日的第二天登录玩家的数量
SQL:方法一
select
first_date as install_dt,
count(*) installs,
round(count(activity.event_date) / count(*), 2) as day1_retention
from (
select
player_id,
min(event_date) as first_date
from activity group by player_id
) t1 left join activity
on t1.player_id = activity.player_id
and datediff(activity.event_date, t1.first_date) = 1
group by first_date;
解析
-
每个玩家第一个登录日,作为
t1
表select player_id, min(event_date) as first_date from activity group by player_id;
-
需要得到每个玩家第一个登陆日之后第二天有没有登录
- 将
t1
表和activity
左连 - 这要注意输出的字段
event_date
是第一天登录之后第二天又登录的玩家日期
select * from t1 left join activity on t1.player_id = activity.player_id and datediff(activity.event_date, t1.first_date) = 1;
- 将
-
查询
t1
和activity
左连后的表,按照第一个登陆日进行分组first_date
count(*)
第一个登陆日的总人数count(activity.event_date)
第二个登录日的人数a/b
是第一天玩家的留存率
SQL:方法二
select
first_date as install_dt,
count(distinct player_id) as installs,
round(
sum(if(date_add(first_date, interval 1 day) = event_date, 1, 0))
/ count(distinct player_id),
2) as day1_retention
from (
select
player_id,
event_date,
min(event_date) over(partition by player_id) as first_date
from activity
) t1 group by first_date;
解析
方法二的思路和方法一是一样的,使用了窗口函数代替了分组。
-
使用窗口函数,计算出每个玩家的第一次登录日期,作为临时表
t1
select player_id, event_date, min(event_date) over(partition by player_id) as first_date from activity;
输出
player_id | event_date | first_date 1 | 2016-03-01 | 2016-03-01 1 | 2016-03-02 | 2016-03-01 2 | 2017-06-25 | 2017-06-25 3 | 2016-03-01 | 2016-03-01 3 | 2018-07-03 | 2016-03-01
-
查询
t1
,按照first_date
进行分组count(distinct player_id)
第一个登陆日的总人数sum(if(date_add(first_date, interval 1 day) = event_date, 1, 0))
date_add(first_date, interval 1 day) = event_date
第一个登陆日后一天也登录的用户sum(if(expr, 1, 0))
或者count(if(expr, 1, null))
,使用sum
求和if
表达式的false
应该用0
,使用count
求和if
表达式的false
应该用null
,因为count
会忽略null
SQL:方法三
select
a1.event_date as install_dt,
count(a1.event_date) as installs,
round(
sum(if(datediff(a2.event_date, a1.event_date) = 1, 1, 0))
/ count(a1.event_date),
2) as day1_retention
from activity a1
left join activity a2 on a1.player_id = a2.player_id
and datediff(a2.event_date, a1.event_date) = 1
left join activity a3 on a1.player_id = a3.player_id
and a1.event_date > a3.event_date
where a3.event_date is null
group by a1.event_date;
解析
activity
表自身左连 2
次
-
筛选出第一天登录的玩家,第二天也登录了
datediff(a2.event_date, a1.event_date) = 1
select * from activity a1 left join activity a2 on a1.player_id = a2.player_id and datediff(a2.event_date, a1.event_date) = 1
-
筛选出第一个登录日,因为
a1.event_date > a3.event_date
,所以第一个登录日a3.event_date
为null
select * from activity a1 left join activity a3 on a1.player_id = a3.player_id and a1.event_date > a3.event_date where a3.event_date is null
-
合并第一步和第二步
select * from activity a1 left join activity a2 on a1.player_id = a2.player_id and datediff(a2.event_date, a1.event_date) = 1 left join activity a3 on a1.player_id = a3.player_id and a1.event_date > a3.event_date where a3.event_date is null
-
计算出对应的值
a1.event_date
是第一个登录日count(a1.event_date)
第一个登录日的总人数sum(if(datediff(a2.event_date, a1.event_date) = 1, 1, 0))
第一个登录日的第二天登录的总人数b/c
是第一天玩家的留存率
SQL:方法四
select
first_date as install_dt,
count(a1.event_date) as installs,
round(
sum(if(datediff(a2.event_date, a1.event_date) = 1, 1, 0))
/ count(a1.event_date),
2) as day1_retention
from (
select
player_id,
min(event_date) first_date
from activity group by player_id
) t1
left join activity a1 on t1.player_id = a1.player_id
and t1.first_date = a1.event_date
left join activity a2 on a1.player_id = a2.player_id
and datediff(a2.event_date, a1.event_date) = 1
group by first_date;
解析
方法四和方法三是一个思路,不同都是查询第一个登录日,用下面的方法代替方法三中的第二步,方法三中的第二步迷惑性很大,不太好理解。
select
player_id,
min(event_date) first_date
from activity group by player_id