分组子集对齐后再做差集
【问题】
I have two tables 1)users(id,registerdate) 2)user_answer(userid,answer,updated_date)
I want the count of zero usage per day. How many users are registering but not answering per day. Results will be like this:
Date registedCount notAnsweredCount
15-09-02 20 10
15-09-01 20 10
15-08-31 12 4
Data will be like for user table((1,‘15-09-01’),(2,‘15-09-01’),(3,‘15-09-01’)) for user answer table ((1,0,15-09-01)).. Here you can see three users are registered on the day of sep 01, 2015 but the only one user has answered one question. So, result will be (Date=>15-09-01, registedCount => 3, notAnsweredCount => 2)
有人给出解答,楼主说比较像,但没进一步反馈
SELECT date_range.aDay,
COUNT(DISTINCT users.id) AS registedCount,
SUM(IF(users.id IS NOT NULL AND user_answer.userid IS NULL, 1, 0)) AS notAnsweredCount
FROM
(
SELECT DATE_ADD('2015-09-01', INTERVAL units.aCnt + tens.aCnt * 10 DAY) AS aDay
FROM
(
SELECT 0 AS aCnt UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) units
CROSS JOIN
(
SELECT 0 AS aCnt UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) tens
) date_range
LEFT OUTER JOIN users
ON date_range.aDay = users.registerdate
LEFT OUTER JOIN user_answer
ON users.id = user_answer.userid
GROUP BY date_range.aDay
【回答】
有两个难点要解决:库表按照指定的日期序列分组,而不是库表中的字段;每日注册的 id 如何和每日解答的 userid 进行差集运算。用 SQL 实现会比较难理解,可以用 SPL 来帮助 SQL 实现,代码分步写出很直观:
参数设置:
SPL 脚本:
A |
|
1 |
$select id,registerdate from users where registerdate>=? And registerdate<=?; argBegin,argEnd |
2 |
$select userid,updated_date from user_answer where updated_date>=? And updated_date<=?; argBegin,argEnd |
3 |
=periods(argBegin,argEnd) |
4 |
=A1.align@a(A3,registerdate).(~.(id)) |
5 |
=A2.align@a(A3,updated_date).(~.(userid)) |
6 |
=A3.new(~:Date,A4(#).len():registedCount,(A4(#)\A5(#)).len():notAnsweredCount) |
A1、A2:通过 SQL 取表中数据
A3:使用函数 periods,根据参数生成时间序列
A4、A5:使用函数 align,将集合按指定序列对齐
A6:生成新序表,“\”表示差集