比较两个文本,找出在另一文本中出现过的串并换某种格式输出
【问题】
I am writing a program to read two files and then compare them word by word and line by line. Basically, I need to check if the first line in the first text file is a substring of any line in the second file then display the first word of each line of the second file that it is a substring of then repeat the process with all the other lines of the first file. Additionally, I need to do this without using java functions like contains().
For each line in the first file, I need to check the first word with each word in the lines of the second file till I find a match. Once I find a match I need to check if the second word in the first file is the same as the next word in the second file and so on until the end of the line in the first file. If the entire line in the first file is contained in a line of the second file then the program must print the first word of that line from the second file.
For example
File1.txt
like parks
went out
go out
File2.txt
I like to go out because I like parks
Ben does not go out much
Shelly went out often but does not like parks
Harry does not go out neither does he like parks
Desired Output:
q1. like parks
I
Shelly
Harry
q2. went out
Shelly
q3. go out
I
Ben
Harry
// Import io so we can use file objects
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.*;
public class wordc {
public static void main(String[] args) {
try {
//reads the files
BufferedReader bf1 = new BufferedReader(new FileReader("File1.txt"));
BufferedReader bf2 = new BufferedReader(new FileReader("File2.txt"));
int k =0, l = 0, i = 0, j = 0, count = 0, linecount1 = 0, linecount2 = 0, wordcount1 = 0, wordcount2 = 0;
String line1, line2;
//counts the number of lines in File1
while((line1 = bf1.readLine()) != null)
{
linecount1++;
}
//counts the number of lines in File2
while((line2 = bf2.readLine()) != null)
{
linecount2++;
}
// loop to iterate through File1
while((line1 = bf1.readLine()) != null && k < linecount1)
{
System.out.println("q"+ k++ + "line1");
//store words in the current line in the File1 in a word array
String[] word1 = line1.split(" ");
//number of words in the line
wordcount1 = word1.length;
//loop to iterate through File2
while ((line2 = bf1.readLine()) != null && l < linecount2)
{
//store words in current line in the File2 in a word array
String[] word2 = line2.split(" ");
// number of words in the line
wordcount2 = word2.length;
count = 0;
while(j < wordcount1)
{
while(i < wordcount2)
{
//compare first word in word1 array to first word in word2 array
//continue to compare till a match is found
//once a match is found increament count
// and compare the next word in the word1 array with the next word in the word2 array
//and so on
if (word1[j].equals(word2[i]))
{
i++;
j++;
count++;
}
//if the current word in word1 does not match the word in word2
//check the current word in word1 with the next word in word2
else
{
i++;
break;
}
}
}
//if the number of words in a line in File1 matched a portion of a line in File2
//print the first word of that line
if(count == wordcount1)
System.out.println(line2[l]);
l++;
}
k++;
}
bf1.close();
bf2.close();
}
catch (IOException e) {
System.out.println("IO Error Occurred:" + e.toString());
}
}
}
Thanks in advance for all the help! :)
【回答】
这个问题并不复杂,使用双层循环结合字符串操作(查询、拆分、合并、定位)就可实现,但是从底层写起确实很复杂,可以用SPL辅助实现,只需三行代码:
A |
|
1 |
=file("D:\\file1.txt").read@n() |
2 |
=file("D:\\file2.txt").read@n() |
3 |
=A1.conj(A2.select(pos(~,A1.~)).(~.words()(1))) |
A1:按行读入文本file1.txt
A2:按行读入文本file2.txt
A3:函数conj对集合成员进行循环计算,最后合并各个子集。函数select对集合成员筛选,函数pos可判断字符串包含关系,函数words将字符串按单词拆分。
conj\select\pos都是循环函数,可以大幅降低循环语句的使用,代码更简练
写好的脚本如何在应用程序中调用,可以参考Java 如何调用 SPL 脚本