从 SQL 到 SPL:组内相邻记录配对并转置

某表存储来自外部的近几天人员刷卡进出建筑的记录,一个人进出一栋建筑时,一般有 N 对记录,每对先 in 再 out 各一条。有时数据不规范不成对,只有一条 in 或一条 out,或连续多条 in 或多条 out。

username

building

action

timestamp

user-1

building-1

IN

2024-04-10 01:00:00.000

user-1

building-1

OUT

2024-04-10 02:00:00.000

user-1

building-1

IN

2024-04-10 02:30:00.000

user-1

building-1

OUT

2024-04-10 04:00:00.000

user-1

building-1

IN

2024-04-11 10:00:00.000

user-1

building-1

OUT

2024-04-11 11:00:00.000

user-2

building-1

IN

2024-04-12 10:00:00.000

user-2

building-1

OUT

2024-04-12 11:00:00.000

user-2

building-2

IN

2024-04-10 08:00:00.000

user-2

building-2

OUT

2024-04-10 09:00:00.000

user-2

building-3

OUT

2024-04-11 02:30:00.000

user-2

building-4

IN

2024-04-11 04:00:00.000

user-3

building-1

OUT

2024-04-10 01:00:00.000

user-3

building-1

IN

2024-04-10 10:00:00.000

user-3

building-1

IN

2024-04-10 11:00:00.000

user-3

building-1

IN

2024-04-10 12:00:00.000

user-3

building-1

OUT

2024-04-10 13:00:00.000

user-3

building-1

OUT

2024-04-10 14:00:00.000

user-3

building-1

OUT

2024-04-10 15:00:00.000

现在要把每对记录由行转列,变成一条记录。不成对的数据应当单独转为一条记录,空缺的部分填 null。

username

building

IN

OUT

user-1

building-1

2024-04-10 01:00:00.000

2024-04-10 02:00:00.000

user-1

building-1

2024-04-10 02:30:00.000

2024-04-10 04:00:00.000

user-1

building-1

2024-04-11 10:00:00.000

2024-04-11 11:00:00.000

user-2

building-1

2024-04-12 10:00:00.000

2024-04-12 11:00:00.000

user-2

building-2

2024-04-10 08:00:00.000

2024-04-10 09:00:00.000

user-2

building-3


2024-04-11 02:30:00.000

user-2

building-4

2024-04-11 04:00:00.000


user-3

building-1


2024-04-10 01:00:00.000

user-3

building-1

2024-04-10 10:00:00.000


user-3

building-1

2024-04-10 11:00:00.000


user-3

building-1

2024-04-10 12:00:00.000

2024-04-10 13:00:00.000

user-3

building-1


2024-04-10 14:00:00.000

可以先按人和建筑分组,再将组内相邻的每 2 行分为一个小组,转置每一小组。但 SQL 分组后必须立刻汇总,不能保持子集继续分组,这就要转换思路,用多层嵌套的窗口函数绕过这个难题,很难写。按相邻的 IN/OUT 状态分组是另一个麻烦,需要用多层 PARTION 造出积累值来分组,更麻烦的是这里也必须立刻汇总,不能对分组后的数据继续转置,仍然很繁琐。

SPL 分组后可以保持分组子集,还支持按条件有序分组,可以方便地将一对 IN/OUT 记录分成一个小组,这过程中还用到相邻引用。SPL 可以继续一直保持分组子集,方便对每个小组转置。


 A

1

=dbConn.query("select * from tb order by username,building,timestamp")

2

=A1.group(username,building)

3

=A2.conj(~.group@i(action=="IN" || action[-1]=="OUT"))

4

=A3.conj(~.pivot(username,building;action,timestamp;"IN","OUT"))

A1:从数据库加载数据,按 username、building、timestamp 排序

A2:用 group 函数按人员和建筑分组,但不汇总。

A3:对 A2 的每组数据,每碰到一组 IN 和 OUT(包括单独的 IN 和 OUT)分为一个小组,合并大组只留小组。group 函数的选项 @i 表示按条件分组。~ 是当前组,[-1] 表示上一条记录。

A4:对 A3 的每组数据,由行转为列,合并各组只留记录。pivot 函数用于转置,可指定转置后的列名为 IN 和 OUT,以保证缺数据时自动填 null。

问题来源:https://stackoverflow.com/questions/78319976/postgresql-how-to-calculate-swipe-in-and-swipe-out-time