从 SQL 到 SPL:组内相邻记录配对并转置
某表存储来自外部的近几天人员刷卡进出建筑的记录,一个人进出一栋建筑时,一般有 N 对记录,每对先 in 再 out 各一条。有时数据不规范不成对,只有一条 in 或一条 out,或连续多条 in 或多条 out。
username |
building |
action |
timestamp |
user-1 |
building-1 |
IN |
2024-04-10 01:00:00.000 |
user-1 |
building-1 |
OUT |
2024-04-10 02:00:00.000 |
user-1 |
building-1 |
IN |
2024-04-10 02:30:00.000 |
user-1 |
building-1 |
OUT |
2024-04-10 04:00:00.000 |
user-1 |
building-1 |
IN |
2024-04-11 10:00:00.000 |
user-1 |
building-1 |
OUT |
2024-04-11 11:00:00.000 |
user-2 |
building-1 |
IN |
2024-04-12 10:00:00.000 |
user-2 |
building-1 |
OUT |
2024-04-12 11:00:00.000 |
user-2 |
building-2 |
IN |
2024-04-10 08:00:00.000 |
user-2 |
building-2 |
OUT |
2024-04-10 09:00:00.000 |
user-2 |
building-3 |
OUT |
2024-04-11 02:30:00.000 |
user-2 |
building-4 |
IN |
2024-04-11 04:00:00.000 |
user-3 |
building-1 |
OUT |
2024-04-10 01:00:00.000 |
user-3 |
building-1 |
IN |
2024-04-10 10:00:00.000 |
user-3 |
building-1 |
IN |
2024-04-10 11:00:00.000 |
user-3 |
building-1 |
IN |
2024-04-10 12:00:00.000 |
user-3 |
building-1 |
OUT |
2024-04-10 13:00:00.000 |
user-3 |
building-1 |
OUT |
2024-04-10 14:00:00.000 |
user-3 |
building-1 |
OUT |
2024-04-10 15:00:00.000 |
现在要把每对记录由行转列,变成一条记录。不成对的数据应当单独转为一条记录,空缺的部分填 null。
username |
building |
IN |
OUT |
user-1 |
building-1 |
2024-04-10 01:00:00.000 |
2024-04-10 02:00:00.000 |
user-1 |
building-1 |
2024-04-10 02:30:00.000 |
2024-04-10 04:00:00.000 |
user-1 |
building-1 |
2024-04-11 10:00:00.000 |
2024-04-11 11:00:00.000 |
user-2 |
building-1 |
2024-04-12 10:00:00.000 |
2024-04-12 11:00:00.000 |
user-2 |
building-2 |
2024-04-10 08:00:00.000 |
2024-04-10 09:00:00.000 |
user-2 |
building-3 |
2024-04-11 02:30:00.000 |
|
user-2 |
building-4 |
2024-04-11 04:00:00.000 |
|
user-3 |
building-1 |
2024-04-10 01:00:00.000 |
|
user-3 |
building-1 |
2024-04-10 10:00:00.000 |
|
user-3 |
building-1 |
2024-04-10 11:00:00.000 |
|
user-3 |
building-1 |
2024-04-10 12:00:00.000 |
2024-04-10 13:00:00.000 |
user-3 |
building-1 |
2024-04-10 14:00:00.000 |
可以先按人和建筑分组,再将组内相邻的每 2 行分为一个小组,转置每一小组。但 SQL 分组后必须立刻汇总,不能保持子集继续分组,这就要转换思路,用多层嵌套的窗口函数绕过这个难题,很难写。按相邻的 IN/OUT 状态分组是另一个麻烦,需要用多层 PARTION 造出积累值来分组,更麻烦的是这里也必须立刻汇总,不能对分组后的数据继续转置,仍然很繁琐。
SPL 分组后可以保持分组子集,还支持按条件有序分组,可以方便地将一对 IN/OUT 记录分成一个小组,这过程中还用到相邻引用。SPL 可以继续一直保持分组子集,方便对每个小组转置。
A |
|
1 |
=dbConn.query("select * from tb order by username,building,timestamp") |
2 |
=A1.group(username,building) |
3 |
=A2.conj(~.group@i(action=="IN" || action[-1]=="OUT")) |
4 |
=A3.conj(~.pivot(username,building;action,timestamp;"IN","OUT")) |
A1:从数据库加载数据,按 username、building、timestamp 排序。
A2:用 group 函数按人员和建筑分组,但不汇总。
A3:对 A2 的每组数据,每碰到一组 IN 和 OUT(包括单独的 IN 和 OUT)分为一个小组,合并大组只留小组。group 函数的选项 @i 表示按条件分组。~ 是当前组,[-1] 表示上一条记录。
A4:对 A3 的每组数据,由行转为列,合并各组只留记录。pivot 函数用于转置,可指定转置后的列名为 IN 和 OUT,以保证缺数据时自动填 null。
问题来源:https://stackoverflow.com/questions/78319976/postgresql-how-to-calculate-swipe-in-and-swipe-out-time
这部分学到了:”pivot 函数用于转置,可指定转置后的列名为 IN 和 OUT,以保证缺数据时自动填 null。“👍 👍
一直忽略了pivot中第三部分参数的运用。我写的话又得套成json@t (json(…))
谢谢分享!
英文版 https://c.scudata.com/article/1734773980353