从 SQL 到 SPL:组内再做有序分组汇总
某库表描述多个项目(ID)的回款周期,一个回款周期包括普通月份和结账月份,普通月份只有当月金额但没发票,Invoiced=0;结账月份既有当月金额也有发票,Invoiced=1。
ID |
Date |
Invoiced |
Amount |
AAA |
2023-01 |
0 |
10 |
AAA |
2023-02 |
0 |
15 |
AAA |
2023-03 |
1 |
15 |
AAA |
2023-04 |
0 |
10 |
AAA |
2023-05 |
0 |
10 |
AAA |
2023-06 |
1 |
10 |
BBB |
2022-05 |
0 |
40 |
BBB |
2022-06 |
1 |
20 |
BBB |
2022-07 |
0 |
30 |
BBB |
2022-08 |
1 |
30 |
现在要找出每个项目的每个回款月份,并统计该回款周期的总金额。注意回款周期的分组依据和顺序有关,也就是“当上个月的 Invoiced=1 时,开始一个新的分组”,这和常见的等值分组不同。
ID |
Date |
Invoiced |
Amount |
AAA |
2023-03 |
1 |
40 |
AAA |
2023-06 |
1 |
30 |
BBB |
2022-06 |
1 |
60 |
BBB |
2022-08 |
1 |
60 |
SQL:
WITH cte AS (
SELECT *, sum(invoiced) OVER (PARTITION BY ID ORDER BY Date desc) grp
FROM mytable
ORDER BY ID, Date
)
SELECT ID, MAX(date) AS Date, MAX(Invoiced) AS Invoiced, SUM(Amount) AS Amount
FROM cte
GROUP BY ID, grp
ORDER BY ID, Date
SQL 没有直接的有序分组,要用窗口函数 + 子查询的方法打标记,再按标记分组汇总。上面 SQL 使用了倒序后再累计的方法凑出标记,理解困难。
SPL 支持很方便的有序计算,代码很直接。https://try.esproc.com/splx?3dX
A |
|
1 |
$select * from mytable.txt order by ID, Date |
2 |
=A1.run(Amount+=if(ID==ID[-1] && Invoiced[-1]==0,Amount[-1])) |
3 |
=A2.select(Invoiced==1) |
A1:加载数据,注意数据已排序。
A2:当 ID 不变且上个月是普通月份时,将 Amount 修改为累计值;否则(每个回款周期的第一个月)将 Amount 重置为当月金额。[-1] 表示上一条记录。
问题来源:https://stackoverflow.com/questions/78224394/query-to-sum-over-multiple-rows-based-on-column
英文版 https://c.esproc.com/article/1739454636990