Unable to Parse header from github CSV URL using Apache Commons
问题
I'm trying to access the header values for each record which is present in CSV file url from github using Apache commons csv library.
This is my code:
@Service
public class CoronaVirusDataService {
private static String virus_data_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/Aysen_Chile_07032021/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv";
@PostConstruct
public void getVirusData()
{
try
{
URL url = new URL(virus_data_url);
HttpURLConnection con = (HttpURLConnection) url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
while((in.readLine()) != null)
{
StringReader csvReader = new StringReader(in.readLine());
Iterable<CSVRecord> records = CSVFormat.DEFAULT.withFirstRecordAsHeader().parse(csvReader);
for (CSVRecord record : records) {
String country = record.get("Country/Region");
System.out.println(country);
}
}
in.close();
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
When i run the application i'm getting this error:
java.lang.IllegalArgumentException: A header name is missing in [, Afghanistan, 33.93911, 67.709953, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 4, 4, 4, 4, 5, 7, 8, 11, 12, 13, 15, 16, 18, 20, 24, 25, 29, 30, 34, 41, 43, 76, 80, 91, 107, 118, 146, 175, 197, 240, 275, 300, 338, 368, 424, 445, 485, 532, 556, 608, 666, 715, 785, 841, 907, 934, 997, 1027, 1093]
at org.apache.commons.csv.CSVParser.createHeaders(CSVParser.java:501)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:412)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:378)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:1157)
at com.p1.Services.CoronaVirusDataService.getVirusData(CoronaVirusDataService.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
解答
这个问题需要解析带有表头的标准csv 格式的 http 文件,Java 实现则代码较长。
用Java 下的开源包 SPL 很容易写,只要 1 句:
A |
|
1 |
=httpfile("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/Aysen_Chile_07032021/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv").import@ct(Country/Region) |
SPL提供了JDBC 供 Java 调用,把上面的脚本存为 httpcsv.splx,在 Java 中以存储过程的方式调用脚本文件:
…
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
st=con.prepareCall("call httpcsv()");
st.execute();
…
或在Java 中以 SQL 方式直接执行 SPL 串:
…
st = con.prepareStatement("==httpfile(\"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/Aysen_Chile_07032021/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv\").import@ct(Country/Region)");
st.execute();
…
English version