SPL 协助 Mongodb: Only keep the running total for the last item in the partition
Mongdb 数据库中某 collection(名为 grp_score)数据如下:
[
{ "grp": "A", "seq": 1, "score": 1, x: 0 },
{ "grp": "A", "seq": 2, "score": 3, x: 0 },
{ "grp": "A", "seq": 3, "score": 2, x: 0 },
{ "grp": "A", "seq": 4, "score": 4, x: 0 }
{ "grp": "B", "seq": 1, "score": 5, x: 0 },
{ "grp": "B", "seq": 2, "score": 2, x: 0 },
{ "grp": "B", "seq": 3, "score": 4, x: 0 },
{ "grp": "B", "seq": 4, "score": 3, x: 0 }
]
要求:按 grp 分组,组内按 seq 排序,将每组数据的 score 字段求和,求和结果写入组内最后一行的 x 字段中。
期望结果:
[
{ "grp": "A", "seq": 1, "score": 1, x: 0 },
{ "grp": "A", "seq": 2, "score": 3, x: 0 },
{ "grp": "A", "seq": 3, "score": 2, x: 0 },
{ "grp": "A", "seq": 4, "score": 4, x: 10 }
{ "grp": "B", "seq": 1, "score": 5, x: 0 },
{ "grp": "B", "seq": 2, "score": 2, x: 0 },
{ "grp": "B", "seq": 3, "score": 4, x: 0 },
{ "grp": "B", "seq": 4, "score": 3, x: 14 }
]
MongoDB query 在做分组求和的时候,只能把每一行的值都写上,然后再逐行判断把前面行的值赋 0。又由于不知道每一组究竟有多少行,所以只能逆序排序,把第一行的值保留。要想结果和期望的一致,在做完这些后还得再逆序一次。由于代码实在太长,最后一次逆序就不做了,参考写法如下:
db.collection.aggregate([
{
"$setWindowFields": {
partitionBy: "$grp",
sortBy: {
seq: 1
},
output: {
x: {
$sum: "$score",
window: {
documents: [
"unbounded",
"current"
]
}
},
ordering: {
$documentNumber: {}
}
}
}
},
{
"$setWindowFields": {
"partitionBy": "$grp",
"sortBy": {
"ordering": -1
},
"output": {
"rank": {
"$rank": {}
}
}
}
},
{
"$set": {
"ordering": "$$REMOVE",
"rank": "$$REMOVE",
"x": {
"$cond": {
"if": {
$eq: [
1,
"$rank"
]
},
"then": "$x",
"else": 0
}
}
}
}
])
SPL 不但支持分组时保留组集,而且组集是有序的,可以按序号访问每一行,还可以按序号倒序访问。因此可以轻松把每组的和写入最后一行的 x 字段,仅需一行代码即可完成:
A |
|
1 |
=mongo_open@d("mongodb://127.0.0.1:27017/local") |
2 |
=mongo_shell@d(A1, "{'find':'grp_score','projection':{'_id':0}}") |
3 |
=mongo_close(A1) |
4 |
=A2.group(grp;(a=~.sort(seq),a.m(-1).x=a.sum(score))) |
5 |
=json(A2) |
英文版: https://c.esproc.com/article/1741759546129