Grouping duplicates in CSV file and ranking data based on certain values
问题
I have a CSV file like so -
"user_id","age","liked_ad","location"
2145,34,true,USA
6786,25,true,UK
9025,21,false,USA
1145,40,false,UK
The csv file goes on. I worked out that there are duplicate user_id's within the file and so what I am trying to do is find out which users have the most'true'answers for the'liked_ads' column. I am super stuck on how to do this in Java and would appreciate any help.
This is what I have so far to literally just parse the file -
public static void main(String[] args) throws FileNotFoundException
{
Scanner scanner = new Scanner(new File("src/main/resources/advert-data.csv"));
scanner.useDelimiter(",");
while (scanner.hasNext()) {
System.out.print(scanner.next() + "|");
}
scanner.close();
}
I'm stuck on where to go from here in order to achieve what I am trying to achieve.
解答
读取csv 数据,按 user_id 分组并统计第三列为 true 的个数,找出计数值大于 0 的结果再按计数值从大到小排序。用 Java 实现则代码较长。
用Java 下的开源包 SPL 很容易写,只要 1 句:
A |
|
1 |
=file("advert-data.csv").import@cqt().groups(user_id;count(#3==true):count).select(#2>0).sort(-#2) |
SPL 提供了 JDBC 供 Java 调用,把上面的脚本存为 rank.splx,在 Java 中以存储过程的方式调用脚本文件:
…
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
st = con.prepareCall("call rank()");
st.execute();
…
English version