从 SQL 到 SPL:有条件的分组

某库表有多个可以分组的字段。

ID

SPLIT

CUST

DATE

AMOUNT

ID_1

SPLIT_YES

A

2024-05-01 00:00:00

100

ID_1

SPLIT_NO

A

2024-04-01 00:00:00

200

ID_1

SPLIT_YES

B

2024-03-01 00:00:00

50

ID_2

SPLIT_YES

A

2024-05-01 00:00:00

50

ID_2

SPLIT_NO

A

2024-04-01 00:00:00

300

ID_2

SPLIT_NO

B

2024-03-01 00:00:00

300

ID_3

SPLIT_YES

B

2024-04-01 00:00:00

90

ID_3

SPLIT_NO

B

2024-04-01 00:00:00

30

ID_3

SPLIT_NO

A

2024-04-01 00:00:00

10

ID_3

SPLIT_NO

A

2024-03-01 00:00:00

10

现在要进行有条件的分组:按前 2 个字段 ID、SPLIT 分组,如果组内的日期 DATE 无重复,则保留本组日期最近的那条记录,并将 AMOUNT 替换为本组的 AMOUNT 之和;如果组内的日期有重复,则将本组记录再按 CUST 分组,同样保留当前小组日期最近的那条记录,并将 AMOUNT 替换为当前小组的 AMOUNT 之和。

ID

SPLIT

CUST

DATE

AMOUNT

ID_1

SPLIT_NO

A

2024-04-01 00:00:00

200

ID_1

SPLIT_YES

A

2024-05-01 00:00:00

150

ID_2

SPLIT_NO

A

2024-04-01 00:00:00

600

ID_2

SPLIT_YES

A

2024-05-01 00:00:00

50

ID_3

SPLIT_NO

A

2024-04-01 00:00:00

20

ID_3

SPLIT_NO

B

2024-04-01 00:00:00

30

ID_3

SPLIT_YES

B

2024-04-01 00:00:00

90

SQL:

SELECT id, 
       split,
       cust,
       date_column,
       CASE num_cust
       WHEN 1
       THEN total_amount
       ELSE total_cust_amount
       END AS amount
FROM   (
  SELECT t.*,
         COUNT(DISTINCT CASE rnk WHEN 1 THEN cust END)
           OVER (PARTITION BY id, split) AS num_cust
  FROM   (
    SELECT t.*,
           DENSE_RANK() OVER (PARTITION BY id, split ORDER BY date_column DESC) AS rnk,
           SUM(amount) OVER (PARTITION BY id, split) AS total_amount,
           SUM(amount) OVER (PARTITION BY id, split, cust) AS total_cust_amount
    FROM   test_table_mm t
  ) t
  WHERE  rnk = 1
)

SQL 不能保留分组子集,要用多层嵌套查询和多个窗口函数间接实现,代码比较复杂。SPL 分组后不必立即汇总,可以保留子集继续计算,子集也可以继续分组:https://try.esproc.com/splx?44C


 A

1

$select * from test_table_mm.txt order by DATE desc

2

=A1.group(ID,SPLIT)

3

=A2.conj(if(~.icount(DATE)==~.count(), [~], ~.group(CUST)))

4

=A3.(~(1).run(AMOUNT=A3.~.sum(AMOUNT)))

A1:加载数据,按日期逆序排序。

A2:按前 2 个字段分组,但不汇总。

A3:处理每组数据:如果日期没有重复,则返回当前组,否则将当前组按 CUST 再分组,返回每个小组。

A4:继续处理各组数据:取当前组的第 1 条记录,将 AMOUNT 替换为当前组的 AMOUNT 之和。

问题来源:https://stackoverflow.com/questions/78048701/sql-oracle-conditional-group-by