How to calculate some specific data function from the data of a large CSV file

 

问题

https://stackoverflow.com/questions/66753971/how-to-calculate-some-specific-data-function-from-the-data-of-a-large-csv-file

I'm trying to work out the most expensive county to rent a building from data in a CSV file. The data from each column I need the data from has been put into a list. The price range is set by the user so the outer most for loop and if statement ensure that the buildings considered are in the set price range.

The price of a building is also slightly complicated because the price is the minimum stay x price.

In the code below I am trying to get the average property value of one county just son I can get the basic structure right before I carry on, but I'm kind of lost at this point any help would be much appreciated.

publicintsampleMethod()
{
ArrayList<String>county=newArrayList<String>();
ArrayList<Integer>costOfBuildings=newArrayList<Integer>();
ArrayList<Integer>minimumStay=newArrayList<Integer>();
ArrayList<Integer>minimumBuildingCost=newArrayList<Integer>();
try{
//CodetoreaddatafromtheCSVandputthedatainthelists.
}
}
catch(IOException|URISyntaxExceptione){
//Somecode.
}
 
intcount=0;
intavgCountyPrice=0;
intcountyCount=0;
for(intcost:costOfBuildings){
if(costOfBuildings.get(count)>=controller.getMin()&&costOfBuildings.get(count)<=controller.getMax()){
for(StringcurrentCounty:county){
for(intcurrentMinimumStay:minimumStay){
if(currentCounty.equals("samplecounty")){
countyCount++;
inttemp=nightsPermitted*cost;
avgCountyPrice=avgCountyPrice+temp/countyCount;
}
}
}
}
count++;
}
returnavgCountyPrice;
}

Here is a sample table to depict what the CSV looks like, also the CSV file has more than 50,000 rows.

name

county

price

minStay

Morgan

lydney

135

5

John

sedury

34

1

Patrick

newport

9901

7

解答

这个问题需要对csv 文件中的数据按 county 分组计算 price 平均值,再找出 price 平均值最高的 county,Java 实现则代码较长。

Java 下的开源包 SPL 很容易写,只要 1 句:


A

1

=file("data.csv").import@ct().groups(county;avg(price):price_avg).top(-1;price_avg).county

SPL 提供了 JDBC 供 Java 调用,把上面的脚本存为 mostExpensiveCounty.splx,在 Java 中以存储过程的方式调用脚本文件:

Class.forName("com.esproc.jdbc.InternalDriver");

con= DriverManager.getConnection("jdbc:esproc:local://");

st = con.prepareCall("call mostExpensiveCounty()");
st.execute();

SPL 源代码:https://github.com/SPLWare/esProc

问答搜集