从 SQL 到 SPL:从两个表中找出每条记录日期的最接近的匹配项

SQL Server 有两个表,Table1:

ID JoiningDt DocNum
1 2024-04-05 A123
2 2024-04-06 A123
3 2024-04-04 B123
DocNum ClosestDt
A123 2024-04-03
A123 2024-04-04
A123 2024-04-07
A123 2024-04-08
B123 2024-04-02
B123 2024-04-05


现在要将 Table1 按 ID 排序,依次遍历每条记录:取出 Table2 中与当前记录的 DocNum 字段相同,但时间稍晚的那一条记录(所有更晚的记录中最早的)。特别的规则在于,从 Table2 中取过的记录下次不能再取。

ID JoiningDt DocNum ClosestDt
1 2024-04-05 A123 2024-04-07
2 2024-04-06 A123 2024-04-08
3 2024-04-04 B123 2024-04-05

SQL:

WITH CTE1 As (
SELECT t1.ID, t1.JoiningDt, t1.DocNum,
  (SELECT TOP 1 ClosestDt FROM Table2 
  WHERE DocNum = t1.DocNum AND ClosestDt > t1.JoiningDt ORDER BY ClosestDt  ASC   ) ClosestDt
FROM Table1 t1
 ), CTE2 AS (
  SELECT
       ID, JoiningDt, DocNum, ClosestDt
    , ROW_NUMBER() OVER(PARTITION BY DocNum,   ClosestDt ORDER BY ID) rn
  FROM CTE1
  )
SELECT ID, JoiningDt, DocNum, 
  CASE WHEN rn = 1 then ClosestDt ELSE
  (SELECT ClosestDt FROM Table2 
  WHERE DocNum = c1.DocNum AND ClosestDt > c1.JoiningDt ORDER BY ClosestDt  ASC   
  OFFSET c1.rn -1  ROWS FETCH NEXT 1 ROWS ONLY) END as ClosestDt
  FROM CTE2 c1

这里要进行有序计算,特别是要实现取过不能再取的规则,SQL 要造序号和标志位,还要用多层嵌套配合 join 语句来变相实现,代码繁琐难懂;改用存储过程会直观些,但代码更长,结构变复杂。SPL 可以直接按业务逻辑实现。https://try.esproc.com/splx?2HC


 A
1 $select * from Table1.txt order by ID
2 $select * from Table2.txt order by ClosestDt
3 =A1.derive(A2.delete@n([A2.select@1(DocNum==A1.DocNum && ClosestDt>A1.JoiningDt)]).ClosestDt)

A1-A2:加载数据。

A3:循环处理 A1 的每条记录,追加新字段,业务逻辑是:从 A2 取 DocNum 与当前记录相同但更晚的那些记录,从中取第 1 条的 ClosestDt,再从 A2 中删掉这条记录以避免再取。函数 select@1 表示取符合条件的第 1 条记录。

问题来源:https://stackoverflow.com/questions/78206862/find-the-distinct-closest-date-match-for-each-record-from-2-tables