SPL 协助 Mongodb: Only keep the running total for the last item in the partition

Mongdb 数据库中某 collection(名为 grp_score)数据如下:

[
  { "grp": "A", "seq": 1, "score": 1, x: 0 },
  { "grp": "A", "seq": 2, "score": 3, x: 0 },
  { "grp": "A", "seq": 3, "score": 2, x: 0 },
  { "grp": "A", "seq": 4, "score": 4, x: 0 }
  { "grp": "B", "seq": 1, "score": 5, x: 0 },
  { "grp": "B", "seq": 2, "score": 2, x: 0 },
  { "grp": "B", "seq": 3, "score": 4, x: 0 },
  { "grp": "B", "seq": 4, "score": 3, x: 0 }

]

要求:按 grp 分组,组内按 seq 排序,将每组数据的 score 字段求和,求和结果写入组内最后一行的 x 字段中。

期望结果:

[
  { "grp": "A", "seq": 1, "score": 1, x: 0 },
  { "grp": "A", "seq": 2, "score": 3, x: 0 },
  { "grp": "A", "seq": 3, "score": 2, x: 0 },
  { "grp": "A", "seq": 4, "score": 4, x: 10 }
  { "grp": "B", "seq": 1, "score": 5, x: 0 },
  { "grp": "B", "seq": 2, "score": 2, x: 0 },
  { "grp": "B", "seq": 3, "score": 4, x: 0 },
  { "grp": "B", "seq": 4, "score": 3, x: 14 }
]

MongoDB query 在做分组求和的时候,只能把每一行的值都写上,然后再逐行判断把前面行的值赋 0。又由于不知道每一组究竟有多少行,所以只能逆序排序,把第一行的值保留。要想结果和期望的一致,在做完这些后还得再逆序一次。由于代码实在太长,最后一次逆序就不做了,参考写法如下:

db.collection.aggregate([
  {
    "$setWindowFields": {
      partitionBy: "$grp",
      sortBy: {
        seq: 1
      },
      output: {
        x: {
          $sum: "$score",
          window: {
            documents: [
              "unbounded",
              "current"
            ]
          }
        },
        ordering: {
          $documentNumber: {}
        }
      }
    }
  },
  {
    "$setWindowFields": {
      "partitionBy": "$grp",
      "sortBy": {
        "ordering": -1
      },
      "output": {
        "rank": {
          "$rank": {}
        }
      }
    }
  },
  {
    "$set": {
      "ordering": "$$REMOVE",
      "rank": "$$REMOVE",
      "x": {
        "$cond": {
          "if": {
            $eq: [
              1,
              "$rank"
            ]
          },
          "then": "$x",
          "else": 0
        }
      }
    }
  }
])

SPL 不但支持分组时保留组集,而且组集是有序的,可以按序号访问每一行,还可以按序号倒序访问。因此可以轻松把每组的和写入最后一行的 x 字段,仅需一行代码即可完成:



A

1

=mongo_open@d("mongodb://127.0.0.1:27017/local")

2

=mongo_shell@d(A1, "{'find':'grp_score','projection':{'_id':0}}")

3

=mongo_close(A1)

4

=A2.group(grp;(a=~.sort(seq),a.m(-1).x=a.sum(score)))

5

=json(A2)

前三行读数,第四行计算 (其中 m(-1) 表示倒数第一行),第五行将结果转回 json 格式


问题来源:https://stackoverflow.com/questions/78552887/mongodb-aggregation-only-keep-the-running-total-for-the-last-item-in-the-partit