分组子集对齐后再做差集

【问题】

I have two tables 1)users(id,registerdate) 2)user_answer(userid,answer,updated_date)

I want the count of zero usage per day. How many users are registering but not answering per day. Results will be like this:

Date        registedCount   notAnsweredCount

15-09-02    20              10

15-09-01    20              10

15-08-31    12              4

Data will be like for user table((1,‘15-09-01’),(2,‘15-09-01’),(3,‘15-09-01’)) for user answer table ((1,0,15-09-01)).. Here you can see three users are registered on the day of sep 01, 2015 but the only one user has answered one question. So, result will be (Date=>15-09-01, registedCount => 3, notAnsweredCount => 2)

有人给出解答,楼主说比较像,但没进一步反馈

SELECT date_range.aDay,

        COUNT(DISTINCT users.id) AS registedCount,

        SUM(IF(users.id IS NOT NULL AND user_answer.userid IS NULL, 1, 0)) AS notAnsweredCount

FROM

(

    SELECT DATE_ADD('2015-09-01', INTERVAL units.aCnt + tens.aCnt * 10 DAY) AS aDay

    FROM

    (

        SELECT 0 AS aCnt UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9

    ) units

    CROSS JOIN

    (

        SELECT 0 AS aCnt UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9

    ) tens

) date_range

LEFT OUTER JOIN users

ON date_range.aDay = users.registerdate

LEFT OUTER JOIN user_answer

ON users.id = user_answer.userid

GROUP BY date_range.aDay

【回答】

有两个难点要解决:库表按照指定的日期序列分组,而不是库表中的字段;每日注册的 id 如何和每日解答的 userid 进行差集运算。用 SQL 实现会比较难理解,可以用 SPL 来帮助 SQL 实现,代码分步写出很直观:

参数设置:

imagepng

SPL 脚本:



A

1

$select id,registerdate from users where registerdate>=? And registerdate<=?; argBegin,argEnd

2

$select userid,updated_date from user_answer where updated_date>=? And updated_date<=?; argBegin,argEnd

3

=periods(argBegin,argEnd)

4

=A1.align@a(A3,registerdate).(~.(id))

5

=A2.align@a(A3,updated_date).(~.(userid))

6

=A3.new(~:Date,A4(#).len():registedCount,(A4(#)\A5(#)).len():notAnsweredCount)


A1、A2:通过 SQL 取表中数据

A3:使用函数 periods,根据参数生成时间序列

A4、A5:使用函数 align,将集合按指定序列对齐

A6:生成新序表,“\”表示差集