从 SQL 到 SPL:Getting the available ranges from two tables

某数据库有两个表,原库存表 data_add 存储了多个物品的多批库存,每批库存有一个起始编号 START_NUM 和一个终止编号 END_NUM,表示连续编号范围。

ID ITEM_ID START_NUM END_NUM
1 337 101 400
2 337 500 800
3 337 801 1200
4 337 1500 1600
5 337 15000 16000
6 337 20000 30000
7 444 20 30

消耗表 data_cons 存储了多个物品的多批消耗,每批消耗同样有连续编号范围。

ID ITEM_ID START_NUM END_NUM
1 337 240 300
2 337 301 400
3 337 850 1100
4 337 1500 1510
5 337 15000 16000

现在要计算出每个物品的现库存,即原库存减去消耗,注意:各批次分别计算,前后连续的两个批次也分别计算,比如第 2、3 条记录;原库存的连续编号范围可能被消耗成不连续的多段编号,这种情况下要生成多条记录,每条记录对应一段连续编号。

ITEM_ID START_NUM END_NUM
337 101 239
337 500 800
337 801 849
337 1101 1200
337 1511 1600
337 20000 30000
444 20 30

SQL:

with merged_cons(item_id, start_num, end_num) AS (
    SELECT * FROM data_cons
    MATCH_RECOGNIZE (
        PARTITION BY item_id
        ORDER BY start_num, end_num
        MEASURES FIRST(start_num) AS start_num, LAST(end_num) AS end_num
        PATTERN( merged* strt )
        DEFINE
            merged AS MAX(end_num) + 1 >= NEXT(start_num)
    )
)
, intersections(id, item_id, start_before, end_before, start_after, end_after) AS (
    SELECT a.id, a.item_id, /*a.start_num AS add_start, a.end_num AS add_end, 
        b.start_num AS cons_start, b.end_num AS cons_end, */
        CASE WHEN a.start_num < b.start_num - 1 THEN a.start_num END AS start_before,
        CASE WHEN a.start_num < b.start_num - 1 THEN b.start_num - 1 END AS end_before,
        CASE WHEN b.end_num + 1 < a.end_num THEN b.end_num + 1 END AS start_after,
        CASE WHEN b.end_num + 1 < a.end_num THEN a.end_num END AS end_after
    FROM data_add a
    JOIN merged_cons b
        ON a.item_id = b.item_id AND LEAST(a.end_num, b.end_num) >= GREATEST(a.start_num, b.start_num)
)
SELECT item_id, start_before as start_num, end_before as end_num
FROM intersections WHERE start_before IS NOT NULL
UNION ALL
SELECT item_id, start_after as start_num, end_after as end_num
FROM intersections WHERE start_after IS NOT NULL
UNION ALL
SELECT item_id, start_num, end_num
FROM data_add d
WHERE NOT EXISTS(SELECT 1 FROM intersections i WHERE i.id = d.id);

SQL 要用 MATCH_RECOGNIZE 语句 + 多个子查询间接实现集合计算,代码复杂难懂。SPL 可以用变量代表集合,直接进行集合相关的计算:https://try.esproc.com/splx?3xM


 A B
1 $select * from data_add.txt $select * from data_cons.txt
2 =A1.group(ITEM_ID;~.(to(START_NUM,END_NUM)):a) =B1.group(ITEM_ID;~.conj(to(START_NUM,END_NUM)):b)
3 =A2.join(ITEM_ID,B2,b)
4 =A3.derive(a.([~, b].merge@d()).(~.group@i(~!=~[-1]+1)).conj():diff)
5 =A4.news(diff; ITEM_ID, ~1:START_NUM, ~.m(-1):END_NUM)

A1-B1:加载数据。

A2:对原库存按物品分类,将组内的每个批次转为一个连续序列小集合,不汇总。B2:对消耗表类似处理,并将组内的多个小集合合并为一个大集合。~ 表示当前组或当前成员。

A3:左关联。

A4:新增计算列 diff:将原库存的每个小集合和消耗表的大集合进行差集计算,对每个差集进行条件分组,连续的序列分到同一小组。函数 group 用于分组,默认按等值分组,@i 表示按条件分组,~[-1] 表示上一个成员。函数 merge@d 对有序集合归并,计算差集。

A5:用 A4 每条记录的 diff 字段里的每个小组,生成一条记录。函数 news 可将一条记录扩展成多条记录。~.m(1) 表示序列 ~ 里的倒数第一个成员。

问题来源:https://stackoverflow.com/questions/78070892/getting-the-available-ranges-from-two-tables