分组后合并记录中的字段值
【问题】
As for example i have this data in csv file which has the column names as: “people”, “committers”, "repositoryCommitters
The “people” column has the ids from 1-5923 and i want to match the ids if they have the common repository from the “repositoryCommitters” column like for example:
people | repositoryCommitters
1 | x
2 | x
3 | y
people id 1 and 2 has the common repo “x” and how do i get this ids and print in the output like:
*Edges
1 2
means 1 and 2 are link because they have the common repository.
For now the code i have is:
package network;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.LineNumberReader;
import java.io.PrintStream;
import java.io.Writer;
import java.util.ArrayList;
import java.util.Scanner;
public class Read {
static String line;
static BufferedReader br1 = null, br2 =null;
static ArrayList<String> pList = new ArrayList<String>();
static ArrayList<String> rList = new ArrayList<String>();
static File fileName = new File("networkBuilder.txt");
public static void main(String\[\] args) throws IOException
{ String fileContent = "*Vertices " ;
System.out.println("Enter your current directory: ");
Scanner scanner = new Scanner(System.in);
String directory = scanner.nextLine();
try {
br1 = new BufferedReader(new FileReader(directory + "//people.csv"));
br2 = new BufferedReader(new FileReader(directory + "//repo.csv"));
} catch(FileNotFoundException e)
{
System.out.println(e.getMessage() + " \\n file not found re-run and try again");
System.exit(0);
}
int count = 0;
try {
while((line = br1.readLine()) != null){ //skip first line
while((line = br1.readLine()) != null)
{
pList.add(line); // add to array list
count++ ;
} }
} catch (IOException error) {
System.out.println(error.getMessage() + "Error reading file");
}
\**Vertices**\
System.out.println("\\n"); // new line
System.out.println(fileContent + count); //print out vertices
//print out each item in the ArrayList
int size = pList.size();
for(int i=0; i < size; i++){
String\[\] data=(pList.get(i)).split(",");
System.out.println(data\[1\]);
}
// Save the console output in a text file
try{
PrintStream myconsole = new PrintStream(new File(directory + "network.txt"));
System.setOut(myconsole);
//print out each item in the ArrayList
int sz = pList.size(); System.out.println(fileContent + count); //print out vertices
for(int i=0; i < sz; i++){
String\[\] data=(pList.get(i)).split(",");
System.out.println(data\[1\]);
}
} catch(Exception er){
}
/* try{
FileWriter fw = new FileWriter(fileName);
Writer output = new BufferedWriter(fw);
int size = pList.size();
for(int j=0; j<size; j++){
output.write(fileContent + count);
((BufferedWriter) output).newLine();
output.write(pList.get(j) + "\\n");
((BufferedWriter) output).newLine();
}
output.close();
} */
/** Edges**/
fileContent = "\\n*Edges";
System.out.println(fileContent);
// peopleCSV();
// repoCSV();
} // end of main
}
And the output is:
Enter your current directory:
_C:\Users\StudentDoubts\Documents
*Vertices 5923
1
2
3 . . .
【回答】
根据第二列分组,组内将第 1 列合并到同一行,硬编码实现这种算法太复杂,这种情况用集算器实现更方便,SPL 代码简单易懂:
A |
|
1 |
=file(“people.txt”).import@t(;,"|") |
2 |
=A1.group(repositoryCommitters).new(~.(people).concat(“ “):*Edges) |
3 |
=file("D:/result.txt").export@t(A2) |
如果想给输出的每行加上 repositoryCommitters,只需要将 A2 改为
=A1.group(repositoryCommitters).new(~.(people).string(" "):*Edges,repositoryCommitters:repositoryCommitters)
集算器提供了 JDBC 接口,可以像数据库一样使用,Java 如何调用 SPL 脚本。