Grouping duplicates in CSV file and ranking data based on certain values

 

问题

https://stackoverflow.com/questions/65062471/grouping-duplicates-in-csv-file-and-ranking-data-based-on-certain-values

I have a CSV file like so -

"user_id","age","liked_ad","location"

2145,34,true,USA

6786,25,true,UK

9025,21,false,USA

1145,40,false,UK

The csv file goes on. I worked out that there are duplicate user_id's within the file and so what I am trying to do is find out which users have the most'true'answers for the'liked_ads' column. I am super stuck on how to do this in Java and would appreciate any help.

This is what I have so far to literally just parse the file -

public static void main(String[] args) throws FileNotFoundException

{

Scanner scanner = new Scanner(new File("src/main/resources/advert-data.csv"));

scanner.useDelimiter(",");

while (scanner.hasNext()) {

System.out.print(scanner.next() + "|");

}

scanner.close();

}

I'm stuck on where to go from here in order to achieve what I am trying to achieve.

解答

读取csv 数据,按 user_id 分组并统计第三列为 true 的个数,找出计数值大于 0 的结果再按计数值从大到小排序。用 Java 实现则代码较长。

Java 下的开源包 SPL 很容易写,只要 1 句:


A

1

=file("advert-data.csv").import@cqt().groups(user_id;count(#3==true):count).select(#2>0).sort(-#2)

SPL 提供了 JDBC 供 Java 调用,把上面的脚本存为 rank.splx,在 Java 中以存储过程的方式调用脚本文件:

Class.forName("com.esproc.jdbc.InternalDriver");

con= DriverManager.getConnection("jdbc:esproc:local://");

st = con.prepareCall("call rank()");
st.execute();

SPL 源代码:https://github.com/SPLWare/esProc

问答搜集