52 寻找面试候选人

Question

52 寻找面试候选人

Opened this issue 3 years ago · 0 comments

astak16 commented 3 years ago

题目

题目连接：寻找面试候选人

contest_id 是该 contest 的主键。该表包含 LeetCode 竞赛的 id 和该场比赛中金牌、银牌、铜牌的 user_id。可以保证，所有连续的比赛都有连续的 id，没有 id 被跳过。

user_id 是该 user_id 的主键。该表包含用户信息。

编写 SQL 语句来返回面试候选人的姓名和邮件.当用户满足以下两个要求中的任意一条，其成为面试候选人:

该用户在连续三场及更多比赛中赢得奖牌。
该用户在三场及更多不同的比赛中赢得金牌（这些比赛可以不是连续的）

可以以任何顺序返回结果。

create table contests (
  contest_id int,
  gold_medal int,
  silver_medal int,
  bronze_medal int
);
insert into contests (contest_id, gold_medal, silver_medal, bronze_medal) values 
(190, 1, 5, 2),
(191, 2, 3, 5),
(192, 5, 2, 3),
(193, 1, 3, 5),
(194, 4, 5, 2),
(195, 4, 2, 1),
(196, 1, 5, 2);

create table users (
  user_id int,
  mail varchar(50),
  name varchar(30)
);
insert into users (user_id, mail, name) values 
(1, 'sarah@leetcode.com', 'Sarah'),
(2, 'bob@leetcode.com', 'Bob'),
(3, 'alice@leetcode.com', 'Alice'),
(4, 'hercy@leetcode.com', 'Hercy'),
(5, 'quarz@leetcode.com', 'Quarz');

分析

需要查询出的用户是

连续三场比赛或者更多比赛的奖牌
三场或者更多的比赛获得金牌

三场或者更多比赛的金牌比较好查询只需要筛选出金牌数大于等于 3 的用户。

主要的关键点在于如何查询连续三场比赛获得奖牌。

这里使用了四种方法去解题：

方法一使用 **窗口函数，**方法二使用 变量以及 if 函数
- 思路：判断连续的数方法：将数进行升序或者降序排列，然后用这个数减去排列的序号，如果是连续的数，那么它们的结果是相同的。
方法三使用 **多表联查，**方法四使用 窗口函数
- 思路： 1 + 1 = 2 and 1 + 2 = 3

SQL：方法一

select
  name, mail
from (
  select
    user_id
  from (
    select
      user_id,
      contest_id - 
        row_number() over(partition by user_id order by contest_id) diff
    from (
      select contest_id, gold_medal user_id from contests
      union all
      select contest_id, silver_medal user_id from contests
      union all
      select contest_id, bronze_medal user_id from contests
    ) temp1
  ) temp2 group by user_id, diff having count(diff) >= 3
  union
  select
    gold_medal user_id
  from contests group by gold_medal having count(gold_medal) >= 3
) temp3 join users using(user_id);

解析

临时表 temp1 输出 contest_id 和 user_id 字段。
- 将 gold_medal、silver_medal、bronze_medal 获奖者分别查出来，使用 union all 将它们连接起来
临时表 temp2 输出 user_id 和 diff 字段
- 使用 row_number() 进行排序，然后用 contest_id 减去序号，得到 diff，如果是连续的数 diff 是相同的
查询临时表 temp2，按照 user_id 和 diff 进行分组，筛选出 count(diff) >= 3，连续三场及以上的比赛，diff 的数量是大于 3 的
筛选出获得三块及三块以上金牌的用户
临时表 temp3 ，将第 3 步和第 4 步使用 union 连接
查询临时表 temp3 ，将它和 users 表连接，连接条件是 user_id，输出 name 和 mail

SQL：方法二

with temp as (
  select contest_id, gold_medal user_id from contests
  union all
  select contest_id, silver_medal user_id from contests
  union all
  select contest_id, bronze_medal user_id from contests
)
select name, mail from (
  select user_id from (
    select
      user_id,
      if(@pre_user_id = user_id,
        if(@pre_contest_id = contest_id - 1, @rank:=@rank, @rank:=@rank + 1), 
        @rank:=1) diff,
      @pre_contest_id:=contest_id,
      @pre_user_id:=user_id
    from temp,(select @pre_contest_id:=null, @pre_user_id:=null, @rank:=1) init
    order by user_id, contest_id
  ) temp1 group by user_id, diff having count(diff) >= 3
  union
  select 
    gold_medal
  from contests group by gold_medal having count(gold_medal) >= 3
) temp2 join users using(user_id);

解析

思路和方法一一样，区别是这里使用变量代替 row_number

因为其他步骤和方法一一样，所以这里只讲变量部分

设置三个变量， @pre_contest_id，@pre_user_id，@rank
- @pre_contest_id 保存的是上一场比赛的 id，初始化为 null
- @pre_user_id 保存的是上一个用户的 id ，初始化 null
- @rank 保存的是当前排名，初始化为 1
```
select @pre_contest_id:=null, @pre_user_id:=null, @rank:=1
```
以 user_id = 2 为例
- 当运行第一条数据时，@pre_user_id = null，代入 @pre_user_id = user_id 得 null = 2 为 false 进入 @rank:=1，此时 diff 为 1 。
  - if 语句结束后， @pre_contest_id = 190 ， @pre_user_id = 2
- 当运行第二条数据是， @pre_user_id = 2 ，代入 @pre_user_id = user_id 得 2 = 2 为 true ，进入 if(@pre_contest_id = contest_id - 1, @rank:=@rank, @rank:=@rank + 1)，此时 @pre_contest_id = 190，代入 @pre_contest_id = contest_id - 1 的 190 = 191 - 1 为 true，进入 @rank:=@rank，代入得 @rank = 1 ，此时 diff 为 1
  - if 语句结束后， @pre_contest_id = 191 ， @pre_user_id = 2
- 以此类推，直到运行到最后一条数据
Tips

if(@pre_contest_id = contest_id - 1, @rank:=@rank, @rank:=@rank + 1) 是关键，通过这步，可以直接算出 diff ，而需要先算出排名rank，再用题号减去 rank 得到 diff
```
select
  ...
  if(@pre_user_id = user_id,
    if(@pre_contest_id = contest_id - 1, @rank:=@rank, @rank:=@rank + 1),
    @rank:=1) diff,
  @pre_contest_id:=contest_id,
  @pre_user_id:=user_id
from ...
```

SQL：方法三

with temp as (
  select contest_id, gold_medal user_id from contests
  union all
  select contest_id, silver_medal user_id from contests
  union all
  select contest_id, bronze_medal user_id from contests
)
select name, mail from (
  select 
    distinct temp1.user_id 
  from temp temp1, temp temp2, temp temp3
  where temp1.user_id = temp2.user_id and temp2.user_id = temp3.user_id
  and temp1.contest_id + 1 = temp2.contest_id
  and temp2.contest_id + 1 = temp3.contest_id
  union
  select
    gold_medal
  from contests group by gold_medal having count(gold_medal) >= 3
) t join users using(user_id);

解析

首先创建一个临时表 temp，将 gold_medal、silver_medal、bronze_medal 获奖者分别查出来，使用 union all 将它们连接起来。
temp 表自连，连接三次，分别是 temp1、temp2、temp3，除了 user_id 的基本连接条件，最重要的是 temp1.contest_id + 1 = temp2.contest_id and temp2.contest_id + 1 = temp3.contest_id，意思是： 上一场比赛 + 1 = 当前这场比赛，当前这场比赛 + 1 = 下一场比赛
- 这里把 temp2.contest_id 看成当前这场比赛， temp1.contest_id 是上一场比赛， temp3.contest_id 是下一场比赛
查询出获得三块及以上金牌的用户
将第 2 步和第 3 步使用 union 连接起来作为临时表 t
查询临时表 t ，将它和 users 表连接，连接条件是 user_id，输出 name 和 mail

Tips

select 
  distinct temp1.user_id 
from temp temp1, temp temp2, temp temp3
where temp1.user_id = temp2.user_id and temp2.user_id = temp3.user_id

可以替换成 join

select 
  distinct temp1.user_id 
from temp temp1 join temp temp2 using(user_id) join temp temp3 using(user_id)

方法四

with temp as (
  select contest_id, gold_medal user_id from contests
  union all
  select contest_id, silver_medal user_id from contests
  union all
  select contest_id, bronze_medal user_id from contests
)
select name, mail from (
  select user_id from (
    select
      user_id,
      contest_id,
      lead(contest_id, 1) over(partition by user_id order by contest_id) second,
      lead(contest_id, 2) over(partition by user_id order by contest_id) third
    from temp
  ) temp1 where contest_id + 1 = second and contest_id + 2 = third
  union
  select
    gold_medal
  from contests group by gold_medal having count(gold_medal) >= 3
) t join users using(user_id);

解析

方法四和方法三的思路是一样的，只有第 2 步不一样，所以只讲解第 2 步，其他步骤参考方法三。

使用窗口函数 lead()，计算出第二场比赛的 contest_id 和第三场比赛的 contest_id

举例 lead(contest_id, 1) ... second，lead(contest_id, 2) ... third， lead 带领或者领导的意思，当前行带领一行，带领两行

user_id  |  contest_id  |  second  |  third
2        |  190         |  191     |  192
2        |  191         |  192     |  194
2        |  192         |  194     |  195
2        |  194         |  195     |  196
2        |  195         |  196     |  null
2        |  196         |  null    |  null

筛选条件 contest_id + 1 = second and contest_id + 2 = third

也可以使用窗口函数 lag()，计算出上一场比赛的 contest_id 和上两场比赛的 contest_id

举例 lag(contest_id, 1) ... last，lag(contest_id, 2) ... last_two， lag 滞后或落后的意思，当前行滞后一行，滞后两行

user_id  |  contest_id  |  last    |  last_two
2        |  190         |  null    |  null
2        |  191         |  190     |  null
2        |  192         |  191     |  190
2        |  194         |  192     |  191
2        |  195         |  194     |  192
2        |  196         |  195     |  194

筛选条件 contest_id - 1 = last and contest_id - 2 = last_two

也可以同时使用 lag(contest_id, 1) ... last 和 lead(contest_id, 1) ... second

user_id  |  last  |  contest_id  |  second
2        |  null  |  190         |  191
2        |  190   |  191         |  192
2        |  191   |  192         |  194
2        |  192   |  194         |  195
2        |  194   |  195         |  196
2        |  195   |  196         |  null

筛选条件 contest_id - 1 = last and contest_id + 1 = second