"Mongodb 数据库中某 collection(名为 states) 数据如下： [ {order: 1, state: 'one'}, {order: 2, state: 'one'}, .."

KittyYan 北京
润乾公司大数据计算专家
1 回帖 • 74 浏览 • 1 周前

SPL 协助 Mongodb: Grouping adjacent documents

应用计算

Mongodb 数据库中某 collection(名为 states) 数据如下：

[
  {order: 1, state: 'one'},
  {order: 2, state: 'one'},
  {order: 3, state: 'one'},
  {order: 4, state: 'two'},
  {order: 5, state: 'two'},
  {order: 6, state: 'one'},
  {order: 7, state: 'two'},
  {order: 8, state: 'three'},
  {order: 9, state: 'three'}
]

要求：按 state 相邻分组。
期望结果如下：

[
  [
    {order: 1, state: 'one'},
    {order: 2, state: 'one'},
    {order: 3, state: 'one'}
  ],
  [
    {order: 4, state: 'two'},
    {order: 5, state: 'two'}
  ],
  [
    {order: 6, state: 'one'}
  ],
  [
    {order: 7, state: 'two'}
  ],
  [
    {order: 8, state: 'three'},
    {order: 9, state: 'three'}
  ]
]

在 MongoDB 中，直接使用查询语句来实现按 state 相邻分组的功能是比较困难的，因为 MongoDB 的查询语言本身不支持这种复杂的分组逻辑。通过结合聚合管道（Aggregation Pipeline）和一些自定义逻辑可以间接实现这个需求。

下面给出了一种写法：

db.states.aggregate([
  {
    $sort: { order: 1 }
  },
  {
    $group: {
      _id: null,
      documents: { $push: "$$ROOT" }
    }
  },
  {
    $project: {
      grouped: {
        $reduce: {
          input: "$documents",
          initialValue: {
            previousState: null,
            groups: []
          },
          in: {
            $let: {
              vars: {
                currentState: "$$this.state",
                lastGroup: { $arrayElemAt: ["$$value.groups", -1] }
              },
              in: {
                $cond: [
                  { $eq: ["$$value.previousState", "$$this.state"] },
                  {
                    previousState: "$$this.state",
                    groups: {
                      $concatArrays: [
                        { $slice: ["$$value.groups", { $subtract: [{ $size: "$$value.groups" }, 1] }] },
                        [
                          {
                            $concatArrays: [
                              "$$lastGroup",
                              ["$$this"]
                            ]
                          }
                        ]
                      ]
                    }
                  },
                  {
                    previousState: "$$this.state",
                    groups: {
                      $concatArrays: [
                        "$$value.groups",
                        [["$$this"]]
                      ]
                    }
                  }
                ]
              }
            }
          }
        }
      }
    }
  },
  {
    $project: {
      _id: 0,
      grouped: "$grouped.groups"
    }
  },
  {
    $unwind: "$grouped"
  },
  {
    $project: {
      _id: 0,
      documents: "$grouped"
    }
  },
  {
    $group: {
      _id: null,
      result: { $push: "$documents" }
    }
  },
  {
    $project: {
      _id: 0,
      result: 1
    }
  }
])

其运行结果是：

{
  result: [
    [
      {
        _id: ObjectId('67c65e846d497a00cd02a427'),
        order: 1,
        state: 'one'
      },
      {
        _id: ObjectId('67c65e846d497a00cd02a428'),
        order: 2,
        state: 'one'
      },
      {
        _id: ObjectId('67c65e846d497a00cd02a429'),
        order: 3,
        state: 'one'
      }
    ],
    [
      {
        _id: ObjectId('67c65e846d497a00cd02a42a'),
        order: 4,
        state: 'two'
      },
      {
        _id: ObjectId('67c65e846d497a00cd02a42b'),
        order: 5,
        state: 'two'
      }
    ],
    [
      {
        _id: ObjectId('67c65e846d497a00cd02a42c'),
        order: 6,
        state: 'one'
      }
    ],
    [
      {
        _id: ObjectId('67c65e846d497a00cd02a42d'),
        order: 7,
        state: 'two'
      }
    ],
    [
      {
        _id: ObjectId('67c65e846d497a00cd02a42e'),
        order: 8,
        state: 'three'
      },
      {
        _id: ObjectId('67c65e846d497a00cd02a42f'),
        order: 9,
        state: 'three'
      }
    ]
  ]
}

SPL 提供了按相邻记录分组的功能，代码简洁明了：

	A
1	=mongo_open@d("mongodb://127.0.0.1:27017/local")
2	=mongo_shell@d(A1, "{'find':'states','projection':{'_id':0}}")
3	=mongo_close(A1)
4	=A2.group@o(state)
5	=json(A4)

前三行是读数，计算代码只有 A4 一句，A5 将结果转回 json 格式，其输出结果和原文要求一模一样。

问题来源：https://stackoverflow.com/questions/78435404/grouping-adjacent-documents-in-mongodb-by-criteria