去除重复数据
【问题】
I am looking for some help. I have an application at work that generates a csv with user information on it. I want to use Java and take the data, delete duplicate information, rearrange it, and create a spreadsheet, to make life easier. The csv is generated in the following format, but much larger:
21458952, a1234, Doe, John, technology, support staff, work phone, 555-555-5555
21458952, a1234, Doe, John, technology, support staff, work email, johndoe@whatever.net
21458952, a1234, Doe, John, technology, support staff, work pager, 555-555-5555
99946133, b9854, Paul, Jane, technology, administration, work phone, 444-444-4444
99946133, b9854, Paul, Jane, technology, administration, work email, janepaul@whatever.net
99946133, b9854, Paul, Jane, technology, administration, work pager, 444-444-4444
99946133, b9854, Paul, Jane, technology, administration, cell phone, 444-444-4444
I want to delete the duplicates and arrange the data in appropriate columns.
ID | PIN | Lname | Fname | Dept | team | work px | work email
I have been trying to build arrays with a BufferedReader to store the data, but I am running into difficulties dealing with duplicates and manipulating the data into a table.
This is the code I have so far
public class Sort {
public static void main(String[] args) {
BufferedReader br = null;
try{
String line="";
String csvSplitBy=(",");
String outPut;
br = new BufferedReader(new FileReader("C:/Users/Jason/Desktop/test.txt")); //location where the file is retreived
while ((line = br.readLine()) !=null){ //checks to see if the data is there
String[] id = line.split(csvSplitBy);
outPut = id[0] + "," + id[1] + "," + id[2] + "," + id[3] + "," + id[4] + "," + id[5] + "," + id[6] + "," + id[7]
+ "," + id[8] + "," + id[9];//incomplete...using for test...
System.out.println(outPut); //displays the contents of the .txt file
} //ends while statement
} //ends try
catch (IOException e){
System.out.println ("File not found!");
} //ends catch
finally{
try{
if (br !=null)br.close();}
catch(IOException ex){
ex.printStackTrace();
} //ends try
} //ends finally
} //ends main method
} //ends class Sort
【回答】
JAVA没有直接实现文本文件分组或求唯一值的类库,自行编码会非常复杂。这种问题建议采用SPL来协助JAVA完成,代码很简单:
A |
|
1 |
=file("D:\\dup.csv").import@c() |
2 |
=A1.group(_1,_2,_3,_4,_5,_6;~.select@1(_7=="work phone")._8,~.select@1(_7=="work email")._8) |
3 |
=file("D:\\result.csv").export@c(A2) |
A1:读取文件dup.csv中的内容。
A2:去除重复数据,并选出需要的数据。
A3:将A2导出到文件result.csv中。
SPL脚本可以嵌入JAVA中使用(参考Java 如何调用 SPL 脚本)