想进入大厂的小伙伴是不是经常被问到类似这样的题目:怎样用SQL找出连续登录天数大于7天的用户?
看得题目后一头雾水,不知道从哪里下手,最后垂头丧气,丢掉了offer。其实这个问题不难,关键点我已经列出来了,下面我使用GBase 8a来给大家讲解一下。
SQL都是标准SQL,其他数据库也是一样的。原理和步骤我都分解好了,反复看3遍,你一定可以通过。
目标:构造一个辅助列,假如某个用户连续登录,这个列就是同一个值
做的上面的目标,就可以方便的得到正确答案。大厂的offer就在向你招手!来吧,走起。
假设有3个用户,id分别是1001,1002,1003;分别登录了13,15和3天。
第一步 建表,插入数据
CREATE TABLE login_log ( id integer, login_date date ); insert into login_log
values(1001,'2021-01-01'); insert into login_log values(1001,'2021-01-02');
insert into login_log values(1001,'2021-01-04'); insert into login_log
values(1001,'2021-01-05'); insert into login_log values(1001,'2021-01-06');
insert into login_log values(1001,'2021-01-07'); insert into login_log
values(1001,'2021-01-08'); insert into login_log values(1001,'2021-01-09');
insert into login_log values(1001,'2021-01-10'); insert into login_log
values(1001,'2021-01-11'); insert into login_log values(1001,'2021-01-13');
insert into login_log values(1001,'2021-01-15'); insert into login_log
values(1001,'2021-01-16'); insert into login_log values(1002,'2021-01-01');
insert into login_log values(1002,'2021-01-02'); insert into login_log
values(1002,'2021-01-03'); insert into login_log values(1002,'2021-01-04');
insert into login_log values(1002,'2021-01-05'); insert into login_log
values(1002,'2021-01-06'); insert into login_log values(1002,'2021-01-07');
insert into login_log values(1002,'2021-01-08'); insert into login_log
values(1002,'2021-01-09'); insert into login_log values(1002,'2021-01-10');
insert into login_log values(1002,'2021-01-11'); insert into login_log
values(1002,'2021-01-12'); insert into login_log values(1002,'2021-01-13');
insert into login_log values(1002,'2021-01-16'); insert into login_log
values(1002,'2021-01-17'); insert into login_log values(1003,'2021-01-4');
insert into login_log values(1003,'2021-01-5'); insert into login_log
values(1003,'2021-01-6'); 第二步 编写SQL如下:
select id ,t,count(t) from ( select id,ADDDATE(login_date,INTERVAL -rn DAY) t
FROM ( select id,login_date, row_number() over (partition by id order by
login_date) rn from login_log ) q1 ) q group by id,t having count(t) > 7 ;
执行后就得到了正确答案,如下所示,是不是很简单?
gbase> select id ,t,count(t) from ( -> SELECT id,ADDDATE(login_date,INTERVAL
-rn DAY) t FROM ( -> select id,login_date,row_number() -> over(partition by id
order by login_date) rn from login_log -> ) q1 -> ) q -> group by id,t having
count(t) > 7 -> ; +------+------------+----------+ | id | t | count(t) |
+------+------------+----------+ | 1001 | 2021-01-01 | 8 | | 1002 | 2020-12-31
| 13 | +------+------------+----------+ 2 rows in set (Elapsed: 00:00:00.14)
第三步 如果不理解,我来详细分解一下:
1 使用row_number函数添加辅助列,此时同一id的如果连续登录, login_date 和 row_number 就会同步增长,他们的差值 t列
一定相同。
请注意下面的rn列是对于相同的id是连续增长的。请仔细观察: id是1001这些数据对应的t列的值,连续登录的记录t值是相同的。
gbase> select -> id,login_date, -> row_number() over(partition by id order by
login_date) rn, -> adddate(login_date,INTERVAL - row_number() over(partition by
id order by login_date) DAY) t -> from login_log ;
+------+------------+----+------------+ | id | login_date | rn | t |
+------+------------+----+------------+ | 1001 | 2021-01-01 | 1 | 2020-12-31 |
| 1001 | 2021-01-02 | 2 | 2020-12-31 | | 1001 | 2021-01-04 | 3 | 2021-01-01 | |
1001 | 2021-01-05 | 4 | 2021-01-01 | | 1001 | 2021-01-06 | 5 | 2021-01-01 | |
1001 | 2021-01-07 | 6 | 2021-01-01 | | 1001 | 2021-01-08 | 7 | 2021-01-01 | |
1001 | 2021-01-09 | 8 | 2021-01-01 | | 1001 | 2021-01-10 | 9 | 2021-01-01 | |
1001 | 2021-01-11 | 10 | 2021-01-01 | | 1001 | 2021-01-13 | 11 | 2021-01-02 | |
1001 | 2021-01-15 | 12 | 2021-01-03 | | 1001 | 2021-01-16 | 13 | 2021-01-03 | |
1003 | 2021-01-04 | 1 | 2021-01-03 | | 1003 | 2021-01-05 | 2 | 2021-01-03 | |
1003 | 2021-01-06 | 3 | 2021-01-03 | | 1002 | 2021-01-01 | 1 | 2020-12-31 | |
1002 | 2021-01-02 | 2 | 2020-12-31 | | 1002 | 2021-01-03 | 3 | 2020-12-31 | |
1002 | 2021-01-04 | 4 | 2020-12-31 | | 1002 | 2021-01-05 | 5 | 2020-12-31 | |
1002 | 2021-01-06 | 6 | 2020-12-31 | | 1002 | 2021-01-07 | 7 | 2020-12-31 | |
1002 | 2021-01-08 | 8 | 2020-12-31 | | 1002 | 2021-01-09 | 9 | 2020-12-31 | |
1002 | 2021-01-10 | 10 | 2020-12-31 | | 1002 | 2021-01-11 | 11 | 2020-12-31 | |
1002 | 2021-01-12 | 12 | 2020-12-31 | | 1002 | 2021-01-13 | 13 | 2020-12-31 | |
1002 | 2021-01-16 | 14 | 2021-01-02 | | 1002 | 2021-01-17 | 15 | 2021-01-02 |
+------+------------+----+------------+ 31 rows in set (Elapsed: 00:00:00.02)
2 执行下面的sql,获得关键的结果。
SELECT id,login_date,ADDDATE(login_date,INTERVAL -rn DAY) t FROM ( select
id,login_date, row_number() over(partition by id order by login_date) rn from
login_log ) q1 ;
注意下面的结果,对于单个id,login_date列连续记录对应的t列,即ADDDATE(login_date,INTERVAL -rn DAY)是相同的。
一定要看懂上面这句话!!!反复的看!!!
如果还不能理解,请仔细看下面id为1001的数据。
考虑对t列进行count 使用having过滤,此时回到第二步 就得到了答案。
gbase> SELECT id,login_date,ADDDATE(login_date,INTERVAL -rn DAY) t FROM -> (
-> select -> id,login_date, -> row_number() over(partition by id order by
login_date) rn -> from login_log -> ) q1 ; +------+------------+------------+ |
id | login_date | t | +------+------------+------------+ | 1001 | 2021-01-01 |
2020-12-31 | | 1001 | 2021-01-02 | 2020-12-31 | | 1001 | 2021-01-04 |
2021-01-01 | | 1001 | 2021-01-05 | 2021-01-01 | | 1001 | 2021-01-06 |
2021-01-01 | | 1001 | 2021-01-07 | 2021-01-01 | | 1001 | 2021-01-08 |
2021-01-01 | | 1001 | 2021-01-09 | 2021-01-01 | | 1001 | 2021-01-10 |
2021-01-01 | | 1001 | 2021-01-11 | 2021-01-01 | | 1001 | 2021-01-13 |
2021-01-02 | | 1001 | 2021-01-15 | 2021-01-03 | | 1001 | 2021-01-16 |
2021-01-03 | | 1002 | 2021-01-01 | 2020-12-31 | | 1002 | 2021-01-02 |
2020-12-31 | | 1002 | 2021-01-03 | 2020-12-31 | | 1002 | 2021-01-04 |
2020-12-31 | | 1002 | 2021-01-05 | 2020-12-31 | | 1002 | 2021-01-06 |
2020-12-31 | | 1002 | 2021-01-07 | 2020-12-31 | | 1002 | 2021-01-08 |
2020-12-31 | | 1002 | 2021-01-09 | 2020-12-31 | | 1002 | 2021-01-10 |
2020-12-31 | | 1002 | 2021-01-11 | 2020-12-31 | | 1002 | 2021-01-12 |
2020-12-31 | | 1002 | 2021-01-13 | 2020-12-31 | | 1002 | 2021-01-16 |
2021-01-02 | | 1002 | 2021-01-17 | 2021-01-02 | | 1003 | 2021-01-04 |
2021-01-03 | | 1003 | 2021-01-05 | 2021-01-03 | | 1003 | 2021-01-06 |
2021-01-03 | +------+------------+------------+ 31 rows in set (Elapsed:
00:00:00.12)