astak16/blog-mysql

16 游戏玩法分析 IV

Opened this issue · 0 comments

题目

查询首次登录游戏并且第二天再次登录游戏玩家的比率,四舍五入到小数点后两位。

create table activity (
	player_id int,
	device_id int,
	event_date date,
	games_played int
);

insert into activity values
(1, 2, '2016-03-01', 5),
(1, 2, '2016-03-02', 6),
(1, 3, '2017-06-25', 1),
(3, 1, '2016-03-02', 0),
(3, 4, '2018-07-03', 5);

SQL:方法一

select round(select count(player_id) from (
	select
		player_id,
		event_date,
		lag(event_date)
			over(partition by player_id order by event_date) as next_date,
		rank() over(partition by player_id order by event_date) as login_times
	from activity
) as temp where datediff(event_date, next_date) = 1 and login_times = 2
/
select count(player_id) from (
	select player_id from activity group by player_id
) as temp, 2) as fraction;

解析

需要求出两个值:第一个值是总人数,第二个值是连续登录两天的玩家数

  1. 求总人数:

activity 表按照 player_id 进行分组在计算出总人数

select count(player_id) from (
	select player_id from activity group by player_id
) as temp

还有一种写法:

select count(distinct player_id) from activity
  1. 求连续两天登录游戏的玩家数:

使用窗口函数 lag(event_date) 将日期按照 player_id 分组,并按照 event_date 升序排列,然后偏移一天:

lag(event_date) over(partition by player_id order by event_date) as next_date

使用 rank 对日期按照 player_id 分组,并按照 event_date 升序排列,然后排序:

rank() over(partition by player_id order by event_date) as login_times

组合成完整的 SQL

select
	player_id,
	event_date,
	lag(event_date)
		over(partition by player_id order by event_date) as next_date,
	rank() over(partition by player_id order by event_date) as login_times
from activity

筛选出日期相差一天,并且是第一次连续登录,这里 login_times 取值 2 ,因为上面按照登录日期进行排序过了,第一天登录是 1 ,第二天登录是 2

计算出 player_id 的个数就可以算出连续登录两天的玩家了数了:

select count(player_id) from (
	select
		player_id,
		event_date,
		lag(event_date)
			over(partition by player_id order by event_date) as next_date,
		rank() over(partition by player_id order by event_date) as login_times
	from activity
) as temp where datediff(event_date, next_date) = 1 and login_times = 2
  1. 最后:

使用 round() 保留两位小数

SQL:方法二

select round((
	(select count(player_id) from (
		select
			player_id,
			datediff(event_date, min(event_date) over(partition by player_id)) as diff
		from activity
	) as temp where diff = 1) / (select count(distinct player_id) from activity)
), 2) as fraction;

解析

  1. 计算出每个用户最近两个登陆日期时间差:
select
	player_id,
	datediff(event_date, min(event_date) over(partition by player_id)) as diff
from activity
  1. 计算出连续两天登录的用户数, diff1
select count(player_id) from (
	select
		player_id,
		datediff(event_date, min(event_date) over(partition by player_id)) as diff
	from activity
) as temp where diff = 1
  1. 总人口计算方法,在方法一种已经给出了
  2. 最后使用 round() 保留两位小数

SQL:方法三

with temp as (
	select
		player_id,
		datediff(event_date, min(event_date) over(partition by player_id)) as diff
	from activity
) select round(
	sum(case diff when 1 then 1 else 0 end) /
	count(distinct player_id),
2) as fraction from temp;

解析

  1. 计算出每个用户最近两个登陆日期时间差,在方法二中给出了
  2. 将它作为临时表 temp ,用 with
  3. diff = 1 的和是连续两天登录的用户:
sum(case diff when 1 then 1 else 0 end)
  1. 临时表 temp 中记录了每个用户每次登录的时间差,去重计算 player_id
count(distinct player_id)
  1. 最后使用 round() 保留两位小数

SQL:方法四

select round(avg(event_date is not null), 2) as fraction from (
	select player_id, min(event_date) as first_login from activity
	group by player_id
) temp left join activity
on temp.player_id = activity.player_id
and datediff(event_date, first_login) = 1;

解析

  1. 使用 min 算出每个用户第一次登录的日期:
select player_id, min(event_date) as first_login from activity
group by player_id
  1. 左连 activity 表:
select * from (
	select player_id, min(event_date) as first_login from activity
	group by player_id
) temp left join activity
on temp.player_id = activity.player_id
and datediff(event_date, first_login) = 1;
  1. 因为 temp 只有每个用户第一次登录游戏的时间,加上连接条件 datediff(event_date, first_login) = 1 后,不满足条件的用户数据都 null
  2. 使用 avg(event_date is not null) 算出平均数
    1. temp 表是每个用户第一次登录的时间
    2. activity 表是存储着所有数据
    3. temp 左连 activity 筛选出第一次连续两天登录的玩家,满足条件的玩家是有值的,没有满足条件的玩家是 null
    4. event_date is not null 结果是 1 或者 0 ,不是日期了,也没有 null 了。
  3. 使用 round() 计算出平均数

Tips

datediff 语法

rank 语法