从 SQL 到 SPL:有条件的分组
某库表有多个可以分组的字段。
ID | SPLIT | CUST | DATE | AMOUNT |
ID_1 | SPLIT_YES | A | 2024-05-01 00:00:00 | 100 |
ID_1 | SPLIT_NO | A | 2024-04-01 00:00:00 | 200 |
ID_1 | SPLIT_YES | B | 2024-03-01 00:00:00 | 50 |
ID_2 | SPLIT_YES | A | 2024-05-01 00:00:00 | 50 |
ID_2 | SPLIT_NO | A | 2024-04-01 00:00:00 | 300 |
ID_2 | SPLIT_NO | B | 2024-03-01 00:00:00 | 300 |
ID_3 | SPLIT_YES | B | 2024-04-01 00:00:00 | 90 |
ID_3 | SPLIT_NO | B | 2024-04-01 00:00:00 | 30 |
ID_3 | SPLIT_NO | A | 2024-04-01 00:00:00 | 10 |
ID_3 | SPLIT_NO | A | 2024-03-01 00:00:00 | 10 |
现在要进行有条件的分组:按前 2 个字段 ID、SPLIT 分组,如果组内的日期 DATE 无重复,则保留本组日期最近的那条记录,并将 AMOUNT 替换为本组的 AMOUNT 之和;如果组内的日期有重复,则将本组记录再按 CUST 分组,同样保留当前小组日期最近的那条记录,并将 AMOUNT 替换为当前小组的 AMOUNT 之和。
ID | SPLIT | CUST | DATE | AMOUNT |
ID_1 | SPLIT_NO | A | 2024-04-01 00:00:00 | 200 |
ID_1 | SPLIT_YES | A | 2024-05-01 00:00:00 | 150 |
ID_2 | SPLIT_NO | A | 2024-04-01 00:00:00 | 600 |
ID_2 | SPLIT_YES | A | 2024-05-01 00:00:00 | 50 |
ID_3 | SPLIT_NO | A | 2024-04-01 00:00:00 | 20 |
ID_3 | SPLIT_NO | B | 2024-04-01 00:00:00 | 30 |
ID_3 | SPLIT_YES | B | 2024-04-01 00:00:00 | 90 |
SQL:
SELECT id,
split,
cust,
date_column,
CASE num_cust
WHEN 1
THEN total_amount
ELSE total_cust_amount
END AS amount
FROM (
SELECT t.*,
COUNT(DISTINCT CASE rnk WHEN 1 THEN cust END)
OVER (PARTITION BY id, split) AS num_cust
FROM (
SELECT t.*,
DENSE_RANK() OVER (PARTITION BY id, split ORDER BY date_column DESC) AS rnk,
SUM(amount) OVER (PARTITION BY id, split) AS total_amount,
SUM(amount) OVER (PARTITION BY id, split, cust) AS total_cust_amount
FROM test_table_mm t
) t
WHERE rnk = 1
)
SQL 不能保留分组子集,要用多层嵌套查询和多个窗口函数间接实现,代码比较复杂。SPL 分组后不必立即汇总,可以保留子集继续计算,子集也可以继续分组:https://try.esproc.com/splx?44C
A | |
1 | $select * from test_table_mm.txt order by DATE desc |
2 | =A1.group(ID,SPLIT) |
3 | =A2.conj(if(~.icount(DATE)==~.count(), [~], ~.group(CUST))) |
4 | =A3.(~(1).run(AMOUNT=A3.~.sum(AMOUNT))) |
A1:加载数据,按日期逆序排序。
A2:按前 2 个字段分组,但不汇总。
A3:处理每组数据:如果日期没有重复,则返回当前组,否则将当前组按 CUST 再分组,返回每个小组。
A4:继续处理各组数据:取当前组的第 1 条记录,将 AMOUNT 替换为当前组的 AMOUNT 之和。
问题来源:https://stackoverflow.com/questions/78048701/sql-oracle-conditional-group-by
英文版 https://c.esproc.com/article/1744427412948