"问题 [链接] I have a CSV file like so - 'user_id','age','liked_ad','location' 2145,34,true,USA 6786, .."

Charname
乾学院 1750 号会员
1 回帖 • 512 浏览 • 2 年前

Grouping duplicates in CSV file and ranking data based on certain values

Q&A

csv(24) 分组聚合(2) 过滤(5) 排序(4)

问题

https://stackoverflow.com/questions/65062471/grouping-duplicates-in-csv-file-and-ranking-data-based-on-certain-values

I have a CSV file like so -

"user_id","age","liked_ad","location"

2145,34,true,USA

6786,25,true,UK

9025,21,false,USA

1145,40,false,UK

The csv file goes on. I worked out that there are duplicate user_id's within the file and so what I am trying to do is find out which users have the most'true'answers for the'liked_ads' column. I am super stuck on how to do this in Java and would appreciate any help.

This is what I have so far to literally just parse the file -

public static void main(String[] args) throws FileNotFoundException

{

Scanner scanner = new Scanner(new File("src/main/resources/advert-data.csv"));

scanner.useDelimiter(",");

while (scanner.hasNext()) {

System.out.print(scanner.next() + "|");

}

scanner.close();

}

I'm stuck on where to go from here in order to achieve what I am trying to achieve.

解答

读取csv 数据，按 user_id 分组并统计第三列为 true 的个数，找出计数值大于 0 的结果再按计数值从大到小排序。用 Java 实现则代码较长。

用Java 下的开源包 SPL 很容易写，只要 1 句：

	A
1	=file("advert-data.csv").import@cqt().groups(user_id;count(#3==true):count).select(#2>0).sort(-#2)

SPL 提供了 JDBC 供 Java 调用，把上面的脚本存为 rank.splx，在 Java 中以存储过程的方式调用脚本文件：

…

Class.forName("com.esproc.jdbc.InternalDriver");

con= DriverManager.getConnection("jdbc:esproc:local://");

st = con.prepareCall("call rank()");
st.execute();

…

SPL 源代码：https://github.com/SPLWare/esProc

问答搜集

csv(24) 分组聚合(2) 过滤(5) 排序(4)

Grouping duplicates in CSV file and ranking data based on certain values

问题

解答

目录