create a new column that contains a list of values from another column subsequent rows
问题
I have a table like below,
TMP,ID,VALUE
2021-12-03 04:03:45,ID1,O
2021-12-03 04:03:46,ID1,P
2021-12-03 04:03:47,ID1,Q
2021-12-03 04:03:48,ID1,R
2021-12-03 04:03:49,ID1,
2021-12-03 04:03:50,ID1,S
2021-12-03 04:03:51,ID1,T
2021-12-04 11:09:03,ID2,A
2021-12-04 11:09:04,ID2,B
2021-12-04 11:09:05,ID2,C
2021-12-04 11:09:06,ID2,D
and want to create a new column that contains a list of values from another column subsequent rows like below,
TMP,ID,VALUE,LIST
2021-12-03 04:03:45,ID1,O,P,Q,R
2021-12-03 04:03:46,ID1,P,Q,R
2021-12-03 04:03:47,ID1,Q,R
2021-12-03 04:03:48,ID1,R,
2021-12-03 04:03:49,ID1,,
2021-12-03 04:03:50,ID1,S,T
2021-12-03 04:03:51,ID1,T,
2021-12-04 11:09:03,ID2,A,B,C,D
2021-12-04 11:09:04,ID2,B,C,D
2021-12-04 11:09:05,ID2,C,D
2021-12-04 11:09:06,ID2,D,
for copy paste: timestamp ID Value
2021-12-03 04:03:45 ID1 O
2021-12-03 04:03:46 ID1 P
2021-12-03 04:03:47 ID1 Q
2021-12-03 04:03:48 ID1 R
2021-12-03 04:03:49 ID1 NULL
2021-12-03 04:03:50 ID1 S
2021-12-03 04:03:51 ID1 T
2021-12-04 11:09:03 ID2 A
2021-12-04 11:09:04 ID2 B
2021-12-04 11:09:05 ID2 C
解答
对TMP 有序的数据,逻辑上也对 ID 有序,先按 ID 有序分组,组内当上一条记录的 VALUE 为 null 时再分组,每组中将下一条记录的 VALUE 直至组内最后一条记录的非空 VALUE 的序列,按逗号拼接作为新列 LIST 的值。SQL 中集合是无序的,使得计算过程变得非常麻烦,需要先借助窗口函数创建标识列,再利用标识列自连接,最后分组拼接VALUE列得到结果。通常的办法是读出来用Python或SPL来做, SPL(一种 Java 的开源包)更容易被Java应用集成,代码也更简单一点,只要两句:
A |
|
1 |
=ORACLE.query("SELECT * FROM TAB ORDER BY 1") |
2 |
=A1.group@o(#2).conj(~.group@i(#3[-1]==null).run(tmp=~.(#3).select(~),~=~.derive(tmp.m(#+1:).concat@c():LIST))).conj() |
English version