Unable to Parse header from github CSV URL using Apache Commons

 

问题

https://stackoverflow.com/questions/67898113/unable-to-parse-header-from-github-csv-url-using-apache-commons

I'm trying to access the header values for each record which is present in CSV file url from github using Apache commons csv library.

This is my code:

@Service

public class CoronaVirusDataService {

private static String virus_data_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/Aysen_Chile_07032021/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv";

@PostConstruct

public void getVirusData()

{

try

{

URL url = new URL(virus_data_url);

HttpURLConnection con = (HttpURLConnection) url.openConnection();

BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));

while((in.readLine()) != null)

{

StringReader csvReader = new StringReader(in.readLine());

Iterable<CSVRecord> records = CSVFormat.DEFAULT.withFirstRecordAsHeader().parse(csvReader);

for (CSVRecord record : records) {

String country = record.get("Country/Region");

System.out.println(country);

}

}

in.close();

}

catch(Exception e)

{

e.printStackTrace();

}

}

}

When i run the application i'm getting this error:

java.lang.IllegalArgumentException: A header name is missing in [, Afghanistan, 33.93911, 67.709953, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 4, 4, 4, 4, 5, 7, 8, 11, 12, 13, 15, 16, 18, 20, 24, 25, 29, 30, 34, 41, 43, 76, 80, 91, 107, 118, 146, 175, 197, 240, 275, 300, 338, 368, 424, 445, 485, 532, 556, 608, 666, 715, 785, 841, 907, 934, 997, 1027, 1093]

at org.apache.commons.csv.CSVParser.createHeaders(CSVParser.java:501)

at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:412)

at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:378)

at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:1157)

at com.p1.Services.CoronaVirusDataService.getVirusData(CoronaVirusDataService.java:34)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

解答

这个问题需要解析带有表头的标准csv 格式的 http 文件,Java 实现则代码较长。

Java 下的开源包 SPL 很容易写,只要 1 句:


A

1

=httpfile("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/Aysen_Chile_07032021/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv").import@ct(Country/Region)

SPL提供了JDBC 供 Java 调用,把上面的脚本存为 httpcsv.splx,在 Java 中以存储过程的方式调用脚本文件:

Class.forName("com.esproc.jdbc.InternalDriver");

con= DriverManager.getConnection("jdbc:esproc:local://");

st=con.prepareCall("call httpcsv()");

st.execute();

或在Java 中以 SQL 方式直接执行 SPL 串:

st = con.prepareStatement("==httpfile(\"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/Aysen_Chile_07032021/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv\").import@ct(Country/Region)");
st.execute();

SPL 源代码:https://github.com/SPLWare/esProc

问答搜集