SPL 协助 Mongodb: Grouping adjacent documents
Mongodb 数据库中某 collection(名为 states) 数据如下:
[
{order: 1, state: 'one'},
{order: 2, state: 'one'},
{order: 3, state: 'one'},
{order: 4, state: 'two'},
{order: 5, state: 'two'},
{order: 6, state: 'one'},
{order: 7, state: 'two'},
{order: 8, state: 'three'},
{order: 9, state: 'three'}
]
要求:按 state 相邻分组。
期望结果如下:
[
[
{order: 1, state: 'one'},
{order: 2, state: 'one'},
{order: 3, state: 'one'}
],
[
{order: 4, state: 'two'},
{order: 5, state: 'two'}
],
[
{order: 6, state: 'one'}
],
[
{order: 7, state: 'two'}
],
[
{order: 8, state: 'three'},
{order: 9, state: 'three'}
]
]
在 MongoDB 中,直接使用查询语句来实现按 state 相邻分组的功能是比较困难的,因为 MongoDB 的查询语言本身不支持这种复杂的分组逻辑。通过结合聚合管道(Aggregation Pipeline)和一些自定义逻辑可以间接实现这个需求。
下面给出了一种写法:
db.states.aggregate([
{
$sort: { order: 1 }
},
{
$group: {
_id: null,
documents: { $push: "$$ROOT" }
}
},
{
$project: {
grouped: {
$reduce: {
input: "$documents",
initialValue: {
previousState: null,
groups: []
},
in: {
$let: {
vars: {
currentState: "$$this.state",
lastGroup: { $arrayElemAt: ["$$value.groups", -1] }
},
in: {
$cond: [
{ $eq: ["$$value.previousState", "$$this.state"] },
{
previousState: "$$this.state",
groups: {
$concatArrays: [
{ $slice: ["$$value.groups", { $subtract: [{ $size: "$$value.groups" }, 1] }] },
[
{
$concatArrays: [
"$$lastGroup",
["$$this"]
]
}
]
]
}
},
{
previousState: "$$this.state",
groups: {
$concatArrays: [
"$$value.groups",
[["$$this"]]
]
}
}
]
}
}
}
}
}
}
},
{
$project: {
_id: 0,
grouped: "$grouped.groups"
}
},
{
$unwind: "$grouped"
},
{
$project: {
_id: 0,
documents: "$grouped"
}
},
{
$group: {
_id: null,
result: { $push: "$documents" }
}
},
{
$project: {
_id: 0,
result: 1
}
}
])
其运行结果是:
{
result: [
[
{
_id: ObjectId('67c65e846d497a00cd02a427'),
order: 1,
state: 'one'
},
{
_id: ObjectId('67c65e846d497a00cd02a428'),
order: 2,
state: 'one'
},
{
_id: ObjectId('67c65e846d497a00cd02a429'),
order: 3,
state: 'one'
}
],
[
{
_id: ObjectId('67c65e846d497a00cd02a42a'),
order: 4,
state: 'two'
},
{
_id: ObjectId('67c65e846d497a00cd02a42b'),
order: 5,
state: 'two'
}
],
[
{
_id: ObjectId('67c65e846d497a00cd02a42c'),
order: 6,
state: 'one'
}
],
[
{
_id: ObjectId('67c65e846d497a00cd02a42d'),
order: 7,
state: 'two'
}
],
[
{
_id: ObjectId('67c65e846d497a00cd02a42e'),
order: 8,
state: 'three'
},
{
_id: ObjectId('67c65e846d497a00cd02a42f'),
order: 9,
state: 'three'
}
]
]
}
SPL 提供了按相邻记录分组的功能,代码简洁明了:
A |
|
1 |
=mongo_open@d("mongodb://127.0.0.1:27017/local") |
2 |
=mongo_shell@d(A1, "{'find':'states','projection':{'_id':0}}") |
3 |
=mongo_close(A1) |
4 |
=A2.group@o(state) |
5 |
=json(A4) |
前三行是读数,计算代码只有 A4 一句,A5 将结果转回 json 格式,其输出结果和原文要求一模一样。
问题来源:https://stackoverflow.com/questions/78435404/grouping-adjacent-documents-in-mongodb-by-criteria
英文版 https://c.esproc.com/article/1741222888221