merge csv files into one document with no repeats in row Java
问题
I have to merge two csv files. I have implemented some code and the files merge yet I am getting repeated rows from the second document written to new lines. Like This : D,M,20211217,test17045,ehdef,ase_17045_26332@ukuat.com,38008621179,2021092700210571,16880,17045,UID1704510000037,1704537,222,0,20000101,London,510000,,0 // First file D,M,20211217,2021092700210471,UID1704510000027,16880,17045 // Second File.
I am looking to merge the two rows together by UniqueID field. This is the CSV Parser: public class CsvParser {
// CODE
public static List<CsvVo> getRecodrsFromACsv(File file, List<String> keys) throws IOException {
BufferedReader br = new BufferedReader(new FileReader(file));
List<CsvVo> records = new ArrayList<>();
boolean isHeader = true;
String line = null;
while ((line = br.readLine()) != null) {
if (isHeader) {// first line is header
isHeader = false;
continue;
}
CsvVo record = new CsvVo(file.getName());
String[] lineSplit = line.split(",");
for (int i = 0; i < lineSplit.length; i++) {
record.put(keys.get(i), lineSplit[i]);
}
records.add(record);
}
br.close();
return records;
}
public static List<String> getHeadersFromACsv(File file) throws IOException {
// if (file.exists()) {
BufferedReader br = new BufferedReader(new FileReader(file));
List<String> headers = null;
String line = null;
while ((line = br.readLine()) != null) {
String[] lineSplit = line.split(",");
headers = new ArrayList<>(Arrays.asList(lineSplit));
log.info("HEADERS :" + headers);
break;
}
br.close();
return headers;
// }
// return null;
}
public static void writeToCsv(final File file, final Set<String> headers, final List<CsvVo> records)
throws IOException {
FileWriter csvWriter = new FileWriter(file);
// write headers
String sep = "";
String[] headersArr = headers.toArray(new String[headers.size()]);
for (String header : headersArr) {
csvWriter.append(sep);
csvWriter.append(header);
sep = "|";
}
csvWriter.append("\n");
// write records at each line
for (CsvVo record : records) {
sep = "";
for (String s : headersArr) {
csvWriter.append(sep);
csvWriter.append(record.get(s));
sep = "|";
}
csvWriter.append("\n");
}
csvWriter.flush();
csvWriter.close();
}
This is the Merge Model //CODE
public class CsvVo {
private Map<String, String> keyVal;
public CsvVo(String id) {
keyVal = new LinkedHashMap<>();// you may also use HashMap if you don't need to keep order
}
public Map<String, String> getKeyVal() {
return keyVal;
}
public void setKeyVal(Map<String, String> keyVal) {
this.keyVal = keyVal;
}
public void put(String key, String val) {
keyVal.put(key, val);
}
public String get(String key) {
return keyVal.get(key);
}
This is the implementation: //CODE
File aseFile = new File("merge/mergeFile.txt");
File newFile = new File("dpcFileReturn.txt");
log.info("File To Be Processed :" + newFile.getName());
List<String> csv1Headers = CsvParser.getHeadersFromACsv(aseFile);
csv1Headers.forEach(h -> System.out.print(h + " "));
// System.out.println();
List<String> csv2Headers = CsvParser.getHeadersFromACsv(newFile);
csv2Headers.forEach(h -> System.out.print(h + " "));
// System.out.println();
List<String> allCsvHeaders = new ArrayList<>();
allCsvHeaders.addAll(csv1Headers);
allCsvHeaders.addAll(csv2Headers);
allCsvHeaders.forEach(h -> System.out.print(h + " "));
// System.out.println();
Set<String> uniqueHeaders = new HashSet<>(allCsvHeaders);
uniqueHeaders.forEach(h -> System.out.print(h + " "));
// System.out.println();
List<CsvVo> csv1Records = CsvParser.getRecodrsFromACsv(aseFile, csv1Headers);
List<CsvVo> csv2Records = CsvParser.getRecodrsFromACsv(newFile, csv2Headers);
List<CsvVo> allCsvRecords = new ArrayList<>();
allCsvRecords.addAll(csv1Records);
allCsvRecords.addAll(csv2Records);
File mergedFile = new File("mergedFile.txt");
CsvParser.writeToCsv(new File("mergedFile.txt"), uniqueHeaders, allCsvRecords);
log.info("Merged File :" + mergedFile);
The first file
recordType,activityType,activityDate,foreName,surName,emailAddress,mobilePhone,dpid,clientID,programmeID,uniqueID,bankAccount,sortCode,isJointAccount,dateOfBirth,addressLine1,postCode,clientReference,suspension,
D|M|20211217|test17045|afdib|ase_17045_29894@ukuat.com|30934992219|2021092700210261|16880|17045|UID1704510000006|1704506|003|0|20000101|London|510000||0|
D|M|20211217|test17045|ibabi|ase_17045_42069@ukuat.com|07676909173|2021092700210271|16880|17045|UID1704510000007|1704507|278|0|20000101|London|510000||0|
secondFile
H,activityType,activityDate,dpid,uniqueID,clientID,programmeID, D,M,20211217,2021092700210261,UID1704510000006,16880,17045, D,M,20211217,2021092700210271,UID1704510000007,16880,17045,
I am looking for a merged file that takes the user name, surname, email and/or uniqueID and validates the data and overwrites the row with the new(Missing) data, if any.
I can find nothing and I've even tried to have a nested loop to check row substring of file 1 against file 2. Cannot seem to get it working though.
Any assistance would be greatly appreciated.
解答
这个问题需要将第二个文件中较新的数据替换第一个文件的数据,结果保存为一个新的文件。JAVA缺乏集合化计算类库,就会特别复杂。
用Java的开源包SPL很容易写,只要几句:
A |
|
1 |
=file("mergeFile.txt").import@ct() |
2 |
=file("dpcFileReturn.txt").import@ct() |
3 |
>A1.run(~.modify@f(ifn(A2.select@1(uniqueID==A1.uniqueID && activityDate>A1.activityDate),~))) |
4 |
=file("mergedFile.txt").export@ct(A1) |
SPL提供了JDBC供JAVA调用,把上面的脚本存为mergeFile.splx,在JAVA中以存储过程的方式调用脚本文件:
…
Class.forName("com.esproc.jdbc.InternalDriver");
con = DriverManager.getConnection("jdbc:esproc:local://");
st = con.prepareCall("call mergeFile ()");
st.execute();
…
English version