merge csv files into one document with no repeats in row Java

 

问题

https://stackoverflow.com/questions/70530490/merge-csv-files-into-one-document-with-no-repeats-in-row-java

I have to merge two csv files. I have implemented some code and the files merge yet I am getting repeated rows from the second document written to new lines. Like This : D,M,20211217,test17045,ehdef,ase_17045_26332@ukuat.com,38008621179,2021092700210571,16880,17045,UID1704510000037,1704537,222,0,20000101,London,510000,,0 // First file D,M,20211217,2021092700210471,UID1704510000027,16880,17045 // Second File.

I am looking to merge the two rows together by UniqueID field. This is the CSV Parser: public class CsvParser {

// CODE

    public static List<CsvVo> getRecodrsFromACsv(File file, List<String> keys) throws IOException {

        BufferedReader br = new BufferedReader(new FileReader(file));

        List<CsvVo> records = new ArrayList<>();

        boolean isHeader = true;

 

        String line = null;

        while ((line = br.readLine()) != null) {

            if (isHeader) {// first line is header

                isHeader = false;

                continue;

            }

            CsvVo record = new CsvVo(file.getName());

            String[] lineSplit = line.split(",");

            for (int i = 0; i < lineSplit.length; i++) {

                record.put(keys.get(i), lineSplit[i]);

            }

            records.add(record);

        }

 

        br.close();

 

        return records;

    }

 

    public static List<String> getHeadersFromACsv(File file) throws IOException {

//        if (file.exists()) {

            BufferedReader br = new BufferedReader(new FileReader(file));

            List<String> headers = null;

 

            String line = null;

            while ((line = br.readLine()) != null) {

                String[] lineSplit = line.split(",");

                headers = new ArrayList<>(Arrays.asList(lineSplit));

                log.info("HEADERS :" + headers);

                break;

            }

 

            br.close();

 

            return headers;

 

//    }

//        return null;

    }

    public static void writeToCsv(final File file, final Set<String> headers, final List<CsvVo> records)

            throws IOException {

        FileWriter csvWriter = new FileWriter(file);

 

        // write headers

        String sep = "";

        String[] headersArr = headers.toArray(new String[headers.size()]);

        for (String header : headersArr) {

            csvWriter.append(sep);

            csvWriter.append(header);

            sep = "|";

        }

 

        csvWriter.append("\n");

 

        // write records at each line

        for (CsvVo record : records) {

            sep = "";

            for (String s : headersArr) {

                csvWriter.append(sep);

                csvWriter.append(record.get(s));

                sep = "|";

            }

            csvWriter.append("\n");

        }

 

        csvWriter.flush();

        csvWriter.close();

    }

This is the Merge Model //CODE

public class CsvVo {

 

    private Map<String, String> keyVal;

 

    public CsvVo(String id) {

        keyVal = new LinkedHashMap<>();// you may also use HashMap if you don't need to keep order

    }

 

    public Map<String, String> getKeyVal() {

        return keyVal;

    }

 

    public void setKeyVal(Map<String, String> keyVal) {

        this.keyVal = keyVal;

    }

 

    public void put(String key, String val) {

        keyVal.put(key, val);

    }

 

    public String get(String key) {

        return keyVal.get(key);

    }

This is the implementation: //CODE

File aseFile = new File("merge/mergeFile.txt");

        File newFile = new File("dpcFileReturn.txt");

        log.info("File To Be Processed :" + newFile.getName());

 

        List<String> csv1Headers = CsvParser.getHeadersFromACsv(aseFile);

        csv1Headers.forEach(h -> System.out.print(h + " "));

        // System.out.println();

        List<String> csv2Headers = CsvParser.getHeadersFromACsv(newFile);

        csv2Headers.forEach(h -> System.out.print(h + " "));

        // System.out.println();

 

        List<String> allCsvHeaders = new ArrayList<>();

        allCsvHeaders.addAll(csv1Headers);

        allCsvHeaders.addAll(csv2Headers);

        allCsvHeaders.forEach(h -> System.out.print(h + " "));

        // System.out.println();

 

        Set<String> uniqueHeaders = new HashSet<>(allCsvHeaders);

        uniqueHeaders.forEach(h -> System.out.print(h + " "));

        // System.out.println();

 

        List<CsvVo> csv1Records = CsvParser.getRecodrsFromACsv(aseFile, csv1Headers);

        List<CsvVo> csv2Records = CsvParser.getRecodrsFromACsv(newFile, csv2Headers);

 

        List<CsvVo> allCsvRecords = new ArrayList<>();

        allCsvRecords.addAll(csv1Records);

        allCsvRecords.addAll(csv2Records);

 

        File mergedFile = new File("mergedFile.txt");

        CsvParser.writeToCsv(new File("mergedFile.txt"), uniqueHeaders, allCsvRecords);

 

        log.info("Merged File :" + mergedFile);

 

The first file

recordType,activityType,activityDate,foreName,surName,emailAddress,mobilePhone,dpid,clientID,programmeID,uniqueID,bankAccount,sortCode,isJointAccount,dateOfBirth,addressLine1,postCode,clientReference,suspension,

D|M|20211217|test17045|afdib|ase_17045_29894@ukuat.com|30934992219|2021092700210261|16880|17045|UID1704510000006|1704506|003|0|20000101|London|510000||0|

D|M|20211217|test17045|ibabi|ase_17045_42069@ukuat.com|07676909173|2021092700210271|16880|17045|UID1704510000007|1704507|278|0|20000101|London|510000||0|

 

secondFile

H,activityType,activityDate,dpid,uniqueID,clientID,programmeID, D,M,20211217,2021092700210261,UID1704510000006,16880,17045, D,M,20211217,2021092700210271,UID1704510000007,16880,17045,

I am looking for a merged file that takes the user name, surname, email and/or uniqueID and validates the data and overwrites the row with the new(Missing) data, if any.

I can find nothing and I've even tried to have a nested loop to check row substring of file 1 against file 2. Cannot seem to get it working though.

Any assistance would be greatly appreciated.

解答

这个问题需要将第二个文件中较新的数据替换第一个文件的数据,结果保存为一个新的文件。JAVA缺乏集合化计算类库,就会特别复杂。

Java的开源包SPL很容易写,只要几句:


A

1

=file("mergeFile.txt").import@ct()

2

=file("dpcFileReturn.txt").import@ct()

3

>A1.run(~.modify@f(ifn(A2.select@1(uniqueID==A1.uniqueID && activityDate>A1.activityDate),~)))

4

=file("mergedFile.txt").export@ct(A1)

SPL提供了JDBCJAVA调用,把上面的脚本存为mergeFile.splx,在JAVA中以存储过程的方式调用脚本文件:

Class.forName("com.esproc.jdbc.InternalDriver");

con = DriverManager.getConnection("jdbc:esproc:local://");

st = con.prepareCall("call mergeFile ()");

st.execute();

SPL源代码:https://github.com/SPLWare/esProc

问答搜集