"MSSQL 数据库表的 id、nbr 是分组字段，组内记录按时间字段 dt 排序后，状态字段 status 出现 0 和 1 的先后规律不同。 id nbr dt status 1 2 202 .."

jinxing
乾学院 53 号会员
2 回帖 • 238 浏览 • 1 年前

从 SQL 到 SPL：根据组内记录情况计算后将结果拼到第一条记录上

应用计算

MSSQL 数据库表的 id、nbr 是分组字段，组内记录按时间字段 dt 排序后，状态字段 status 出现 0 和 1 的先后规律不同。

id	nbr	dt	status
1	2	2023-10-01	1
1	2	2023-11-02	0
1	2	2023-12-13	0
1	3	2023-10-01	0
1	3	2023-11-02	0
1	3	2023-12-13	1
1	9	2023-11-01	0
1	9	2023-12-01	1
1	9	2024-01-01	0
2	1	2023-11-01	0
2	1	2023-12-01	0
2	1	2024-01-01	0
2	2	2023-11-01	1
2	2	2023-12-01	1
2	2	2024-01-01	1

现在要根据组内记录情况按一定规则生成三个计算列，拼到每组第一条记录上，其它记录都填 0。

计算列 Cont1Sta1：当本组至少有一条 status=1 的记录时，赋值为 1，否则填 0。

计算列 DateLagInDays：当本组至少有一条 status=1 的记录，且存在至少一条 status=0 的记录，且后者的第一条记录比前者的第一条记录更早时，赋值为两者的天数间隔，否则填 0。

计算列 Recurrence：当本组至少有一条 status=1 的记录，且存在至少一条 status=0 的记录，且前者的第一条记录比后者的最后一条记录更早时，赋值为 1，否则填 0。

id	Nbr	dt	status	Cont1Sta1	DateLagInDays	Recurrence
1	2	2023-10-01	1	1	0	1
1	2	2023-11-02	0	0	0	0
1	2	2023-12-13	0	0	0	0
1	3	2023-10-01	0	1	73	0
1	3	2023-11-02	0	0	0	0
1	3	2023-12-13	1	0	0	0
1	9	2023-11-01	0	1	30	1
1	9	2023-12-01	1	0	0	0
1	9	2024-01-01	0	0	0	0
2	1	2023-11-01	0	0	0	0
2	1	2023-12-01	0	0	0	0
2	1	2024-01-01	0	0	0	0
2	2	2023-11-01	1	1	0	0
2	2	2023-12-01	1	0	0	0
2	2	2024-01-01	1	0	0	0

SQL 语句：

with cte as (
    select id, nbr, dt, status
        , row_number() over (partition by id, nbr order by id, nbr, dt asc, status desc) rn
        , max(status) over (partition by id, nbr) partition_status
        , min(case when Status = 1 then dt else null end) over (partition by id, nbr) partition_dt1
        , max(case when Status = 1 then dt else null end) over (partition by id, nbr) partition_dt2
    from tb
)
select id, nbr, dt, status
    , case when rn = 1 then partition_status else 0 end Cont1Sta1
    , case when rn = 1 then datediff(day, dt, coalesce(partition_dt1, dt)) else 0 end DateLagInDays
    , case when rn = 1 and exists (select 1 from cte c2 where c2.id = c1.id and c2.nbr = c1.nbr and c2.dt > c1.partition_dt2) then 1 else 0 end Recurrence
from cte c1
order by id, nbr, dt asc, status desc;

SQL分组后必须立刻汇总，不能保持子集进行更复杂的多步计算，这时只能绕道用多个窗口函数重复计算，代码繁琐。SQL没有天然序号，要先用窗口函数生成序号，表达最后一条也不方便，这里要日期关联比较。

SPL可以保持并复用分组子集，自带天然序号，可直接引用第一条或最后一条: https://try.esproc.com/splx?49q

	A
1	$select *, 0 as Cont1Sta1, 0 as DateLagInDays, 0 as Recurrence from tb.txt order by dt
2	=A1.group(id,nbr)
3	=A2.(p1F=~.pselect(status==1), p0=~.pselect@a(status==0), p0F=p0.m(1), p0L=p0.m(-1), if(p1F, Cont1Sta1=1), if(p1F && p0F && p0F<p1F, DateLagInDays=interval(~(p0F).dt,~(p1F).dt)), if(p1F && p0L && p1F<p0L, Recurrence=1))
4	=A2.conj()

A1：加载数据，3 个计算列默认值为 0，按日期排序。

A2：分组但不汇总。

A3：计算各组数据：分别求出第一条 status=1 的记录位置 p1F，第一条 status=0 的记录位置 p0F 和最后一条 status=0 的记录位置 p0L；再按照业务逻辑依次算出 3 个计算列。

函数 pselect 返回符合条件的记录位置，默认返回第一条，@a 表示返回所有位置。函数 m 可按位置取数，-1 表示倒数第一条。

问题来源：https://stackoverflow.com/questions/78271470/perform-determinations-within-a-data-partition