从 SQL 到 SPL:Getting the available ranges from two tables
某数据库有两个表,原库存表 data_add 存储了多个物品的多批库存,每批库存有一个起始编号 START_NUM 和一个终止编号 END_NUM,表示连续编号范围。
ID | ITEM_ID | START_NUM | END_NUM |
1 | 337 | 101 | 400 |
2 | 337 | 500 | 800 |
3 | 337 | 801 | 1200 |
4 | 337 | 1500 | 1600 |
5 | 337 | 15000 | 16000 |
6 | 337 | 20000 | 30000 |
7 | 444 | 20 | 30 |
消耗表 data_cons 存储了多个物品的多批消耗,每批消耗同样有连续编号范围。
ID | ITEM_ID | START_NUM | END_NUM |
1 | 337 | 240 | 300 |
2 | 337 | 301 | 400 |
3 | 337 | 850 | 1100 |
4 | 337 | 1500 | 1510 |
5 | 337 | 15000 | 16000 |
现在要计算出每个物品的现库存,即原库存减去消耗,注意:各批次分别计算,前后连续的两个批次也分别计算,比如第 2、3 条记录;原库存的连续编号范围可能被消耗成不连续的多段编号,这种情况下要生成多条记录,每条记录对应一段连续编号。
ITEM_ID | START_NUM | END_NUM |
337 | 101 | 239 |
337 | 500 | 800 |
337 | 801 | 849 |
337 | 1101 | 1200 |
337 | 1511 | 1600 |
337 | 20000 | 30000 |
444 | 20 | 30 |
SQL:
with merged_cons(item_id, start_num, end_num) AS (
SELECT * FROM data_cons
MATCH_RECOGNIZE (
PARTITION BY item_id
ORDER BY start_num, end_num
MEASURES FIRST(start_num) AS start_num, LAST(end_num) AS end_num
PATTERN( merged* strt )
DEFINE
merged AS MAX(end_num) + 1 >= NEXT(start_num)
)
)
, intersections(id, item_id, start_before, end_before, start_after, end_after) AS (
SELECT a.id, a.item_id, /*a.start_num AS add_start, a.end_num AS add_end,
b.start_num AS cons_start, b.end_num AS cons_end, */
CASE WHEN a.start_num < b.start_num - 1 THEN a.start_num END AS start_before,
CASE WHEN a.start_num < b.start_num - 1 THEN b.start_num - 1 END AS end_before,
CASE WHEN b.end_num + 1 < a.end_num THEN b.end_num + 1 END AS start_after,
CASE WHEN b.end_num + 1 < a.end_num THEN a.end_num END AS end_after
FROM data_add a
JOIN merged_cons b
ON a.item_id = b.item_id AND LEAST(a.end_num, b.end_num) >= GREATEST(a.start_num, b.start_num)
)
SELECT item_id, start_before as start_num, end_before as end_num
FROM intersections WHERE start_before IS NOT NULL
UNION ALL
SELECT item_id, start_after as start_num, end_after as end_num
FROM intersections WHERE start_after IS NOT NULL
UNION ALL
SELECT item_id, start_num, end_num
FROM data_add d
WHERE NOT EXISTS(SELECT 1 FROM intersections i WHERE i.id = d.id);
SQL 要用 MATCH_RECOGNIZE 语句 + 多个子查询间接实现集合计算,代码复杂难懂。SPL 可以用变量代表集合,直接进行集合相关的计算:https://try.esproc.com/splx?3xM
A | B | |
1 | $select * from data_add.txt | $select * from data_cons.txt |
2 | =A1.group(ITEM_ID;~.(to(START_NUM,END_NUM)):a) | =B1.group(ITEM_ID;~.conj(to(START_NUM,END_NUM)):b) |
3 | =A2.join(ITEM_ID,B2,b) | |
4 | =A3.derive(a.([~, b].merge@d()).(~.group@i(~!=~[-1]+1)).conj():diff) | |
5 | =A4.news(diff; ITEM_ID, ~1:START_NUM, ~.m(-1):END_NUM) |
A1-B1:加载数据。
A2:对原库存按物品分类,将组内的每个批次转为一个连续序列小集合,不汇总。B2:对消耗表类似处理,并将组内的多个小集合合并为一个大集合。~ 表示当前组或当前成员。
A3:左关联。
A4:新增计算列 diff:将原库存的每个小集合和消耗表的大集合进行差集计算,对每个差集进行条件分组,连续的序列分到同一小组。函数 group 用于分组,默认按等值分组,@i 表示按条件分组,~[-1] 表示上一个成员。函数 merge@d 对有序集合归并,计算差集。
A5:用 A4 每条记录的 diff 字段里的每个小组,生成一条记录。函数 news 可将一条记录扩展成多条记录。~.m(1) 表示序列 ~ 里的倒数第一个成员。
问题来源:https://stackoverflow.com/questions/78070892/getting-the-available-ranges-from-two-tables
英文版 https://c.esproc.com/article/1743587657678