多文本键值关联

【问题】
I want to match column 1 of file 1 with column 1 of file 2 and then column 2 of file 1 with column 1 of file 3 and then print the matches. The columns in the files are separated by tabs. For example:

file 1:

fji01dde AIDJFMGKG

dlp02sle VMCFIJGM

cmr03lsp CKEIFJ

file 2:

fji01dde 25  30

dlp02sle 40  50

cmr03lsp 60  70

file 3:

AIDJFMGKG

CKEIFJ

output needs to be:

fji01dde AIDJFMGKG 25  30

cmr03lsp CKEIFJ 60  70

I only want lines that are common in all three files.

The below code works well for the first two files, but I need to incorporate the third file. Any ideas?

#!/usr/bin/env perl

use strict;

my (%file1,%file2);

## Open the 1st file

open(A,"file1");

while(<A>){

 chomp;

 ## Split the current line on tabs into the @F array.

 my  @F=split(/\t/);

 push @{$file1{$F[0]}},@F[1..$#F];

}

## Open the 2nd file

open(B,"file2");

while(<B>){

 chomp;

 ## Split the current line on tabs into the @F array.

 my  @F=split(/\t/);

 if (defined($file1{$F[0]})) {

 foreach  my $col (@{$file1{$F[0]}}) {

 print  "$F[0]\t$col\t@F[1..$#F]\n";

 }

 }

}

What would the output be if file3 did not have CKEIFJ

It would be just fji01dde AIDJFMGKG 25 30

别人给出的解答:

use strict;

use warnings;

use  Text::CSV_XS qw(csv);

my  @csv_files = @ARGV;

# Parse all the CSV files into arrays of arrays.

my $data1 = csv( in => $csv_files[0], sep_char => "\t" );

# Parse the other CSV files into hashes of rows keyed on the columns we're going to search on.

my $data2 = csv( in => $csv_files[1],

sep_char => "\t",

 headers => ["code", "num1", "num2"\],

key => "code"

 );

my $data3 = csv( in => $csv_files[2],

sep_char => "\t",

 headers => \["CODE"],

key => "CODE"

 );

for  my $row1 (@$data1) {

 my $row2 = $data2->{$row1->[0]};

 my $row3 = $data3->{$row1->[1]};

 if( $row2 && $row3 ) {

 print join "\t", $row1->[0], $row1->[1], $row2->{num1}, $row2->{num2};

 print  "\n";

 }

}

【回答】
这是典型的结构化计算,file1 的 1 号字段和 file2 的 1 号字段存在左关联,file3 的 1 号字段和 file1 的 2 号字段存在左关联,用支持结构化算法的语言很容易描述并解答问题,Perl 缺乏相关函数,虽能实现但难以理解。

可以选用集算器 SPL 实现上述算法,代码简单易懂:



A

1

=file("D:/file1.txt").import()

2

=file("D:/file2.txt").import()

3

=file("D:/file3.txt").import()

4

=A1.switch(#1,A2:#1)

5

=A3.switch(#1,A1:#2)

6

=A5.new(#1.#1.#1, #1.#2, #1.#1.#2, #1.#1.#3)

A4-A5:建立表间关联

A6:按关联路径取数,其中 #1.#1.#1 表示 file3 的 1 号字段,指向 file1 记录里的 1 号字段,再指向 file2 里的 1 号字段。

1png