简单分组汇总

【问题】

I have a CSV file with thoose values

#BOF
userID;gender;movieID;rating
1;m;100;50
1;m;101;100
1;m;102;0
2;f;100;100
2;f;101;80
3;m;104;70
4;m;104;80
5;f;100;75
#EOF

I want to know how many movies does each user rate? Assume that there are hundred thousands of users. I tried to coded it in Eclipse for Java. Used

while ((strLine = br.readLine()) != null)   {

            String[] strings = strLine.split(";");

but then stopped. I am new at this so probably looks easy, but not for me..yet:=)

【回答】

JAVA缺乏相应的函数,直接实现分组汇总很麻烦,建议SPL辅助:


A

1

=file("d:\\source.csv").read@n()

2

=A1.to(2,A1.len()-1)

3

=A2.concat("\n")

4

=A3.import@t(;";")

5

=A4.groups(userID;count(movieID))

 

A1: 读取source.csv中的内容, 返回成串序列,每行作为一个成员

undefined

A2: 读取A1中第2行到倒数第2行的内容。

undefined

A3: 将序列成员以分隔符“\n”分隔拼成一个字符串。

undefined

A4: 用字符串中读出的内容作为记录并返回成序表

undefined

A5: 按照userID进行分组聚合。

undefined

  这段代码可以方便地集成进Java,参考Java 如何调用 SPL 脚本