行转列,动态列枚举分组
行转列,动态列枚举分组
【问题】
Hi All,
Thanks for the wonderful support the community gets from this forum.
I am trying to accomplish this in MongoDB. Didn’t think it could get this complicated. thought the problem was interesting to solve.
I am trying to get a count of students by scores for various subjects.
the example below shows 2 subjects, in reality we could run this ad-hoc query for 1 or more subjects. (so cannot “can” and should be real-time)
(for e.g. run the query for a bunch of schools and a bunch of subjects)
the grades are always 1-5 no decimal point
use students
db.studentsummary.insert ({school:‘atl1’, sname : ‘Sean’ , sub1: 4, sub2 :5})
db.studentsummary.insert ({school:‘atl1’, sname : ‘chris’ , sub1: 4, sub2 :3})
db.studentsummary.insert ({school:‘atl1’, sname : ‘becky’ , sub1: 5, sub2 :4})
db.studentsummary.insert ({school:‘atl1’, sname : ‘sam’ , sub1: 5, sub2 :4})
db.studentsummary.insert ({school:‘atl2’, sname : ‘dustin’ , sub1: 2, sub2 :2})
db.studentsummary.insert ({school:‘atl2’, sname : ‘greg’ , sub1: 3, sub2 :4})
db.studentsummary.insert ({school:‘atl2’, sname : ‘peter’ , sub1: 5, sub2 :1})
db.studentsummary.insert ({school:‘atl2’, sname : ‘brad’ , sub1: 2, sub2 :2})
db.studentsummary.insert ({school:‘atl2’, sname : ‘liz’ , sub1: 3, sub2 :null})
Desired Output:(Would like to see how close we could get to the desired output below)
show how many got a 5; how many got a 4 and so on…
I tried quite a bit - trying to group by each subject and run different pipelines based on the subjects chosen for query and let the front end manage the merge and pivot , performance was unacceptable. what not.
help will be very highly appreciated.
【回答】
Mongodb 不直接支持行转列,枚举分组做起来也很麻烦,要把数据读出来再用 Java、Php 等编程语言来实现;还涉及到动态列,实现这样的集合运算也很麻烦。使用 SPL 完成这个需求更容易。以下例子不使用 MongoDB 展现,如果使用 MongoDB 需要用 SPL 的 mongo_open()函数连接数据库,用 mongo_shell() 函数查询原始数据结构
A | B | C | |
---|---|---|---|
1 | =mongo_open(“mongodb://localhost:27017/local?user=test&password=test”) | ||
2 | =mongo_shell(A1,“student.find()”) | ||
3 | =[5,4,3,2,1] | >subs=[“sub1”,“sub2”] | |
4 | =A2.group(school) | >newfields=[] | |
5 | for subs | >newfields=newfields|A3.(A5+“_”+string(~)) | |
6 | >result=create(${“school,”+newfields.concat@c()}) | ||
7 | for A4 | >temp=[] | |
8 | for subs | >temp=temp|A7.align@a(A3,${B8}).(~.len()) | |
9 | >temp=[A7.school]|temp | ||
10 | >result.record(temp) | ||
11 | >mongo_close(A1) |
A1、A2:查询原始数据到 A2 序表
A3、B3:准备动态的分数段和列名
A4:将原始数据按学校分组
A5、B5、A6:生成结果表,结构是 school、sub1_5、sub1_4…sub1_1、sub2_5、sub2_4…sub2_1
A7:按学校循环,准备每次向 result 中插入一条数据(B10)
B8、C8:根据 subs 循环,将 A4 中单个学校各科的学生成绩统计出来并按顺序拼接成序列。统计时将 A7 的某科目(B8)按照 A3 对齐分组
B9:拼接学校名称到上述序列,这样值可以和 result 序表的顺序对应上了
B10:插入数据到结果表
如果非动态列,科目较少,写法简化一点,逻辑与上面的方法大同小异:
A | |
---|---|
1 | =mongo_open(“mongodb://localhost:27017/local?user=test&password=test”) |
2 | =mongo_shell(A1,“student.find()”) |
3 | =A2.group(school) |
4 | =[5,4,3,2,1] |
5 | =A3.new(school,.align@a(A4,sub1).(.len()):sub1,.align@a(A4,sub2).(.len()):sub2) |
6 | =A5.new(school,.sub1(1):sub1_5,.sub1(2):sub1_4,.sub1(3):sub1_3,.sub1(4):sub1_2,.sub1(5):sub1_1,.sub2(1):sub2_5,.sub2(2):sub2_4,.sub2(3):sub2_3,.sub2(4):sub2_2,.sub2(5):sub2_1) |
7 | >mongo_close(A1) |