分组后合并记录中的字段值

【问题】

As for example i have this data in csv file which has the column names as: “people”, “committers”, "repositoryCommitters

The “people” column has the ids from 1-5923 and i want to match the ids if they have the common repository from the “repositoryCommitters” column like for example:

people | repositoryCommitters

1 | x

2 | x

3 | y

people id 1 and 2 has the common repo “x” and how do i get this ids and print in the output like:

*Edges

1 2

means 1 and 2 are link because they have the common repository.

For now the code i have is:

package network;

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.LineNumberReader;
import java.io.PrintStream;
import java.io.Writer;
import java.util.ArrayList;
import java.util.Scanner;

public class Read {
 static String line;
 static BufferedReader br1 = null, br2 =null;
 static ArrayList<String> pList = new ArrayList<String>();
 static ArrayList<String> rList = new ArrayList<String>();
 static File fileName = new File("networkBuilder.txt");

 public static void main(String\[\] args) throws IOException
 { String fileContent = "*Vertices " ;

System.out.println("Enter your current directory: ");
 Scanner scanner = new Scanner(System.in);
 String directory = scanner.nextLine();

try {
 br1 = new BufferedReader(new FileReader(directory + "//people.csv"));
 br2 = new BufferedReader(new FileReader(directory + "//repo.csv"));

} catch(FileNotFoundException e)
 {
System.out.println(e.getMessage() + " \\n file not found re-run and try again");
 System.exit(0);
 }
 int count = 0;
 try {
 while((line = br1.readLine()) != null){ //skip first line
 while((line = br1.readLine()) != null)
 {
 pList.add(line); // add to array list
 count++ ;

 } }

} catch (IOException error) {
 System.out.println(error.getMessage() + "Error reading file");
 }
 \**Vertices**\ 
System.out.println("\\n"); // new line
 System.out.println(fileContent + count); //print out vertices
 //print out each item in the ArrayList
 int size = pList.size();
 for(int i=0; i < size; i++){
 String\[\] data=(pList.get(i)).split(",");
 System.out.println(data\[1\]);

} 
// Save the console output in a text file
 try{
 PrintStream myconsole = new PrintStream(new File(directory + "network.txt"));
 System.setOut(myconsole);
 //print out each item in the ArrayList
int sz = pList.size(); System.out.println(fileContent + count); //print out vertices
 for(int i=0; i < sz; i++){
 String\[\] data=(pList.get(i)).split(",");
 System.out.println(data\[1\]);
 }
 } catch(Exception er){
 }

 /* try{
 FileWriter fw = new FileWriter(fileName);
 Writer output = new BufferedWriter(fw);
 int size = pList.size();
 for(int j=0; j<size; j++){

 output.write(fileContent + count);
 ((BufferedWriter) output).newLine();
 output.write(pList.get(j) + "\\n");
 ((BufferedWriter) output).newLine();
 }
output.close(); 

 } */

 /** Edges**/
 fileContent = "\\n*Edges";
 System.out.println(fileContent);
 // peopleCSV();
 // repoCSV();

 } // end of main
}

And the output is:

Enter your current directory:

_C:\Users\StudentDoubts\Documents

*Vertices 5923

1

2

3 . . .

【回答】

根据第二列分组,组内将第 1 列合并到同一行,硬编码实现这种算法太复杂,这种情况用集算器实现更方便,SPL 代码简单易懂:



A

1

=file(“people.txt”).import@t(;,"|")

2

=A1.group(repositoryCommitters).new(~.(people).concat(“ “):*Edges)

3

=file("D:/result.txt").export@t(A2)

如果想给输出的每行加上 repositoryCommitters,只需要将 A2 改为

=A1.group(repositoryCommitters).new(~.(people).string(" "):*Edges,repositoryCommitters:repositoryCommitters)

集算器提供了 JDBC 接口,可以像数据库一样使用,Java 如何调用 SPL 脚本