create a new column that contains a list of values from another column subsequent rows

 

问题

https://stackoverflow.com/questions/70457040/create-a-new-column-that-contains-a-list-of-values-from-another-column-subsequen

I have a table like below,

TMP,ID,VALUE

2021-12-03 04:03:45,ID1,O

2021-12-03 04:03:46,ID1,P

2021-12-03 04:03:47,ID1,Q

2021-12-03 04:03:48,ID1,R

2021-12-03 04:03:49,ID1,

2021-12-03 04:03:50,ID1,S

2021-12-03 04:03:51,ID1,T

2021-12-04 11:09:03,ID2,A

2021-12-04 11:09:04,ID2,B

2021-12-04 11:09:05,ID2,C

2021-12-04 11:09:06,ID2,D

and want to create a new column that contains a list of values from another column subsequent rows like below,

TMP,ID,VALUE,LIST

2021-12-03 04:03:45,ID1,O,P,Q,R

2021-12-03 04:03:46,ID1,P,Q,R

2021-12-03 04:03:47,ID1,Q,R

2021-12-03 04:03:48,ID1,R,

2021-12-03 04:03:49,ID1,,

2021-12-03 04:03:50,ID1,S,T

2021-12-03 04:03:51,ID1,T,

2021-12-04 11:09:03,ID2,A,B,C,D

2021-12-04 11:09:04,ID2,B,C,D

2021-12-04 11:09:05,ID2,C,D

2021-12-04 11:09:06,ID2,D,

for copy paste: timestamp ID Value

2021-12-03 04:03:45 ID1 O

2021-12-03 04:03:46 ID1 P

2021-12-03 04:03:47 ID1 Q

2021-12-03 04:03:48 ID1 R

2021-12-03 04:03:49 ID1 NULL

2021-12-03 04:03:50 ID1 S

2021-12-03 04:03:51 ID1 T

2021-12-04 11:09:03 ID2 A

2021-12-04 11:09:04 ID2 B

2021-12-04 11:09:05 ID2 C

解答

TMP 有序的数据,逻辑上也对 ID 有序,先按 ID 有序分组,组内当上一条记录的 VALUE 为 null 时再分组,每组中将下一条记录的 VALUE 直至组内最后一条记录的非空 VALUE 的序列,按逗号拼接作为新列 LIST 的值。SQL 中集合是无序的,使得计算过程变得非常麻烦,需要先借助窗口函数创建标识列,再利用标识列自连接,最后分组拼接VALUE列得到结果。通常的办法是读出来用PythonSPL来做, SPL(一种 Java 的开源包)更容易被Java应用集成,代码也更简单一点,只要两句:


A

1

=ORACLE.query("SELECT * FROM TAB ORDER BY 1")

2

=A1.group@o(#2).conj(~.group@i(#3[-1]==null).run(tmp=~.(#3).select(~),~=~.derive(tmp.m(#+1:).concat@c():LIST))).conj()

SPL源代码:https://github.com/SPLWare/esProc

问答搜集