从 SQL 到 SPL:Getting the available ranges from two tables

某数据库有两个表,原库存表 data_add 存储了多个物品的多批库存,每批库存有一个起始编号 START_NUM 和一个终止编号 END_NUM,表示连续编号范围。

ID

ITEM_ID

START_NUM

END_NUM

1

337

101

400

2

337

500

800

3

337

801

1200

4

337

1500

1600

5

337

15000

16000

6

337

20000

30000

7

444

20

30

消耗表 data_cons 存储了多个物品的多批消耗,每批消耗同样有连续编号范围。

ID

ITEM_ID

START_NUM

END_NUM

1

337

240

300

2

337

301

400

3

337

850

1100

4

337

1500

1510

5

337

15000

16000

现在要计算出每个物品的现库存,即原库存减去消耗,注意:各批次分别计算,前后连续的两个批次也分别计算,比如第 2、3 条记录;原库存的连续编号范围可能被消耗成不连续的多段编号,这种情况下要生成多条记录,每条记录对应一段连续编号。

ITEM_ID

START_NUM

END_NUM

337

101

239

337

500

800

337

801

849

337

1101

1200

337

1511

1600

337

20000

30000

444

20

30

SQL:

with merged_cons(item_id, start_num, end_num) AS (
    SELECT * FROM data_cons
    MATCH_RECOGNIZE (
        PARTITION BY item_id
        ORDER BY start_num, end_num
        MEASURES FIRST(start_num) AS start_num, LAST(end_num) AS end_num
        PATTERN( merged* strt )
        DEFINE
            merged AS MAX(end_num) + 1 >= NEXT(start_num)
    )
)
, intersections(id, item_id, start_before, end_before, start_after, end_after) AS (
    SELECT a.id, a.item_id, /*a.start_num AS add_start, a.end_num AS add_end, 
        b.start_num AS cons_start, b.end_num AS cons_end, */
        CASE WHEN a.start_num < b.start_num - 1 THEN a.start_num END AS start_before,
        CASE WHEN a.start_num < b.start_num - 1 THEN b.start_num - 1 END AS end_before,
        CASE WHEN b.end_num + 1 < a.end_num THEN b.end_num + 1 END AS start_after,
        CASE WHEN b.end_num + 1 < a.end_num THEN a.end_num END AS end_after
    FROM data_add a
    JOIN merged_cons b
        ON a.item_id = b.item_id AND LEAST(a.end_num, b.end_num) >= GREATEST(a.start_num, b.start_num)
)
SELECT item_id, start_before as start_num, end_before as end_num
FROM intersections WHERE start_before IS NOT NULL
UNION ALL
SELECT item_id, start_after as start_num, end_after as end_num
FROM intersections WHERE start_after IS NOT NULL
UNION ALL
SELECT item_id, start_num, end_num
FROM data_add d
WHERE NOT EXISTS(SELECT 1 FROM intersections i WHERE i.id = d.id);

SQL 要用 MATCH_RECOGNIZE 语句 + 多个子查询间接实现集合计算,代码复杂难懂。SPL 可以用变量代表集合,直接进行集合相关的计算:https://try.esproc.com/splx?3xM


 A

B

1

$select * from data_add.txt

$select * from data_cons.txt

2

=A1.group(ITEM_ID;~.(to(START_NUM,END_NUM)):a)

=B1.group(ITEM_ID;~.conj(to(START_NUM,END_NUM)):b)

3

=A2.join(ITEM_ID,B2,b)


4

=A3.derive(a.([~, b].merge@d()).(~.group@i(~!=~[-1]+1)).conj():diff)

5

=A4.news(diff; ITEM_ID, ~1:START_NUM, ~.m(-1):END_NUM)

A1-B1:加载数据。

A2:对原库存按物品分类,将组内的每个批次转为一个连续序列小集合,不汇总。B2:对消耗表类似处理,并将组内的多个小集合合并为一个大集合。~ 表示当前组或当前成员。

A3:左关联。

A4:新增计算列 diff:将原库存的每个小集合和消耗表的大集合进行差集计算,对每个差集进行条件分组,连续的序列分到同一小组。函数 group 用于分组,默认按等值分组,@i 表示按条件分组,~[-1] 表示上一个成员。函数 merge@d 对有序集合归并,计算差集。

A5:用 A4 每条记录的 diff 字段里的每个小组,生成一条记录。函数 news 可将一条记录扩展成多条记录。~.m(1) 表示序列 ~ 里的倒数第一个成员。

问题来源:https://stackoverflow.com/questions/78070892/getting-the-available-ranges-from-two-tables