从 SQL 到 SPL:按时间区间汇总
某库表存储时序数据,每条记录间隔若干秒。
dt |
payload |
2024-01-01 12:00:13 |
0 |
2024-01-01 12:00:28 |
9 |
2024-01-01 12:00:39 |
2 |
2024-01-01 12:00:44 |
3 |
2024-01-01 12:00:53 |
0 |
2024-01-01 12:01:00 |
6 |
2024-01-01 12:01:19 |
0 |
2024-01-01 12:01:26 |
4 |
2024-01-01 12:01:27 |
2 |
2024-01-01 12:01:49 |
6 |
2024-01-01 12:02:09 |
5 |
2024-01-01 12:02:32 |
6 |
2024-01-01 12:02:59 |
7 |
2024-01-01 12:03:18 |
5 |
2024-01-01 12:03:25 |
4 |
2024-01-01 12:03:28 |
5 |
2024-01-01 12:03:30 |
3 |
2024-01-01 12:03:41 |
1 |
2024-01-01 12:04:23 |
9 |
2024-01-01 12:04:32 |
7 |
2024-01-01 12:04:34 |
5 |
2024-01-01 12:04:43 |
4 |
2024-01-01 12:04:56 |
7 |
2024-01-01 12:05:18 |
7 |
2024-01-01 12:05:33 |
3 |
2024-01-01 12:05:39 |
5 |
2024-01-01 12:05:54 |
1 |
2024-01-01 12:05:56 |
8 |
2024-01-01 12:06:11 |
7 |
2024-01-01 12:06:40 |
2 |
2024-01-01 12:06:47 |
5 |
2024-01-01 12:06:51 |
6 |
2024-01-01 12:06:58 |
5 |
2024-01-01 12:07:06 |
3 |
2024-01-01 12:07:10 |
5 |
2024-01-01 12:07:27 |
0 |
2024-01-01 12:07:33 |
1 |
2024-01-01 12:07:38 |
3 |
2024-01-01 12:08:04 |
2 |
2024-01-01 12:08:11 |
4 |
2024-01-01 12:08:21 |
1 |
2024-01-01 12:08:44 |
1 |
2024-01-01 12:08:55 |
3 |
2024-01-01 12:09:01 |
4 |
2024-01-01 12:09:05 |
0 |
2024-01-01 12:09:12 |
2 |
2024-01-01 12:09:28 |
8 |
2024-01-01 12:09:38 |
5 |
2024-01-01 12:10:14 |
3 |
2024-01-01 12:10:25 |
2 |
2024-01-01 12:10:30 |
7 |
2024-01-01 12:10:39 |
1 |
2024-01-01 12:10:53 |
6 |
2024-01-01 12:11:15 |
5 |
2024-01-01 12:11:25 |
6 |
2024-01-01 12:11:41 |
1 |
2024-01-01 12:11:53 |
8 |
2024-01-01 12:11:59 |
5 |
2024-01-01 12:12:03 |
9 |
2024-01-01 12:12:09 |
5 |
2024-01-01 12:12:28 |
8 |
2024-01-01 12:12:34 |
2 |
2024-01-01 12:12:53 |
1 |
2024-01-01 12:13:13 |
3 |
2024-01-01 12:13:14 |
8 |
2024-01-01 12:13:19 |
8 |
2024-01-01 12:13:48 |
3 |
2024-01-01 12:14:06 |
5 |
2024-01-01 12:14:18 |
1 |
2024-01-01 12:14:36 |
3 |
2024-01-01 12:14:40 |
4 |
2024-01-01 12:14:55 |
3 |
现在要每隔一分钟做一次分组汇总,每次汇总 5 分钟的数据。比如第 1、2、3 秒生成 3 条记录,分别对 1-5、2-6、3-7 分钟的数据做汇总。
From |
To |
payload |
2024-01-01 12:00:00 |
2024-01-01 12:05:00 |
100 |
2024-01-01 12:01:00 |
2024-01-01 12:06:00 |
110 |
2024-01-01 12:02:00 |
2024-01-01 12:07:00 |
117 |
2024-01-01 12:03:00 |
2024-01-01 12:08:00 |
111 |
2024-01-01 12:04:00 |
2024-01-01 12:09:00 |
104 |
2024-01-01 12:05:00 |
2024-01-01 12:10:00 |
91 |
2024-01-01 12:06:00 |
2024-01-01 12:11:00 |
86 |
2024-01-01 12:07:00 |
2024-01-01 12:12:00 |
86 |
2024-01-01 12:08:00 |
2024-01-01 12:13:00 |
99 |
2024-01-01 12:09:00 |
2024-01-01 12:14:00 |
110 |
2024-01-01 12:10:00 |
2024-01-01 12:15:00 |
107 |
SQL:
SELECT
[From], DATEADD(MINUTE, 1, [To]) [To], payload
FROM (
SELECT
dt, MIN(dt) OVER(ORDER BY dt ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) [From],
dt [To], SUM(payload) OVER(ORDER BY dt ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) payload
FROM (
SELECT
DATEADD(MINUTE, DATEDIFF(MINUTE, 0, dt), 0) dt,
SUM(payload) payload
FROM #tmstmp
GROUP BY DATEADD(MINUTE, DATEDIFF(MINUTE, 0, dt), 0)
) q
) q
WHERE DATEDIFF(MINUTE, [From], [To]) > 3
SQL要用嵌套子查询 + 多个窗口函数实现,代码繁琐。SPL 提供了直接访问位置的语法 https://try.esproc.com/splx?4ql
A |
|
1 |
$select * from tmstmp.txt |
2 |
=A1.groups(datetime@m(dt):dt;sum(payload):payload) |
3 |
=A2.new(dt:From, elapse@s(From,300):To, payload[0:4].sum(): payload) |
4 |
=A3.m(:-5) |
A1:加载数据。
A2:按分钟分组汇总。
A3:生成新二维表,To 取自当前记录,From 取自当前记录后的 5 分钟,payload 汇总自当前记录到第 4 条记录的区间。
A4:从第 1 条取到倒数第 5 条。
问题来源:https://stackoverflow.com/questions/78152707/get-sum-for-each-5-minute-time-interval
英文版 https://c.esproc.com/article/1741683659429