从 SQL 到 SPL:有条件的分组

某库表有多个可以分组的字段。

ID SPLIT CUST DATE AMOUNT
ID_1 SPLIT_YES A 2024-05-01 00:00:00 100
ID_1 SPLIT_NO A 2024-04-01 00:00:00 200
ID_1 SPLIT_YES B 2024-03-01 00:00:00 50
ID_2 SPLIT_YES A 2024-05-01 00:00:00 50
ID_2 SPLIT_NO A 2024-04-01 00:00:00 300
ID_2 SPLIT_NO B 2024-03-01 00:00:00 300
ID_3 SPLIT_YES B 2024-04-01 00:00:00 90
ID_3 SPLIT_NO B 2024-04-01 00:00:00 30
ID_3 SPLIT_NO A 2024-04-01 00:00:00 10
ID_3 SPLIT_NO A 2024-03-01 00:00:00 10

现在要进行有条件的分组:按前 2 个字段 ID、SPLIT 分组,如果组内的日期 DATE 无重复,则保留本组日期最近的那条记录,并将 AMOUNT 替换为本组的 AMOUNT 之和;如果组内的日期有重复,则将本组记录再按 CUST 分组,同样保留当前小组日期最近的那条记录,并将 AMOUNT 替换为当前小组的 AMOUNT 之和。

ID SPLIT CUST DATE AMOUNT
ID_1 SPLIT_NO A 2024-04-01 00:00:00 200
ID_1 SPLIT_YES A 2024-05-01 00:00:00 150
ID_2 SPLIT_NO A 2024-04-01 00:00:00 600
ID_2 SPLIT_YES A 2024-05-01 00:00:00 50
ID_3 SPLIT_NO A 2024-04-01 00:00:00 20
ID_3 SPLIT_NO B 2024-04-01 00:00:00 30
ID_3 SPLIT_YES B 2024-04-01 00:00:00 90

SQL:

SELECT id, 
       split,
       cust,
       date_column,
       CASE num_cust
       WHEN 1
       THEN total_amount
       ELSE total_cust_amount
       END AS amount
FROM   (
  SELECT t.*,
         COUNT(DISTINCT CASE rnk WHEN 1 THEN cust END)
           OVER (PARTITION BY id, split) AS num_cust
  FROM   (
    SELECT t.*,
           DENSE_RANK() OVER (PARTITION BY id, split ORDER BY date_column DESC) AS rnk,
           SUM(amount) OVER (PARTITION BY id, split) AS total_amount,
           SUM(amount) OVER (PARTITION BY id, split, cust) AS total_cust_amount
    FROM   test_table_mm t
  ) t
  WHERE  rnk = 1
)

SQL 不能保留分组子集,要用多层嵌套查询和多个窗口函数间接实现,代码比较复杂。SPL 分组后不必立即汇总,可以保留子集继续计算,子集也可以继续分组:https://try.esproc.com/splx?44C


 A
1 $select * from test_table_mm.txt order by DATE desc
2 =A1.group(ID,SPLIT)
3 =A2.conj(if(~.icount(DATE)==~.count(), [~], ~.group(CUST)))
4 =A3.(~(1).run(AMOUNT=A3.~.sum(AMOUNT)))

A1:加载数据,按日期逆序排序。

A2:按前 2 个字段分组,但不汇总。

A3:处理每组数据:如果日期没有重复,则返回当前组,否则将当前组按 CUST 再分组,返回每个小组。

A4:继续处理各组数据:取当前组的第 1 条记录,将 AMOUNT 替换为当前组的 AMOUNT 之和。

问题来源:https://stackoverflow.com/questions/78048701/sql-oracle-conditional-group-by