按标记分组再结构化

【问题】

using the below awk and Perl solutions: I need to understand how to add additional field that don't repeat their output much like"name"in these examples. ie devtwr1 only occurs once in the sample data, If i need to add additional fields that also only occur once, how is this done? Sorry Tried extensively but can't work it out...

here is the source:

have a file: Ntab.txt which is a sample file of two hosts, there is numerous hosts in the real file one after each other.

Each host has multiple "displayName"s (addresses) with corresponding figures for each 'displayName'.

>cat Ntab.txt

name    devtwr1
displayName     00:67:BB
capacityInKB    104,857,600
consumedCapacityInKB    4,042,752
dpPoolID        20
displayName     00:7B:FD
capacityInKB    52,428,800
consumedCapacityInKB    14,880,768
dpPoolID        10
displayName     00:7C:28
capacityInKB    34,179,712
consumedCapacityInKB    29,804,544
dpPoolID        20
displayName     00:7C:29
capacityInKB    34,179,712
consumedCapacityInKB    5,462,016
dpPoolID        20
name    devtwr2
displayName     00:67:BB
capacityInKB    104,857,600
consumedCapacityInKB    4,042,752
dpPoolID        20
displayName     00:7B:FD
capacityInKB    52,428,800
consumedCapacityInKB    14,880,768
dpPoolID        10
displayName     00:7C:28
capacityInKB    34,179,712
consumedCapacityInKB    29,804,544
dpPoolID        20
displayName     00:7C:29
capacityInKB    34,179,712
consumedCapacityInKB    5,462,016
dpPoolID        20

I need to be able to produce the $2 data from the rows into columns after each 'name' (host) and in csv type format, headings optional. I cant use (,:) as separators as the data includes them so either (tab or ;).

Like:

name;displayName;capacityInKB;consumedCapacityInKB;dpPoolID        
devtwr1;00:67:BB;104,857,600;4,042,752;20
devtwr1;00:7B:FD;52,428,800;14,880,768;10
devtwr1;00:7C:28;34,179,712;29,804,544;20
devtwr1;00:7C:29;34,179,712;5,462,016;20
devtwr2;00:67:BB;104,857,600;4,042,752;20
devtwr1;00:7B:FD;52,428,800;14,880,768;10
devtwr1;00:7C:28;34,179,712;29,804,544;20
devtwr1;00:7C:29;34,179,712;5,462,016;20

Here is a Awk solution but cant work out how to add additional field like "name" that dosn't have data in all fields.

$ awk '$1=="name"{name=$2}
    $1 ~/^(displayName|capacityInKB|consumedCapacityInKB)$/{out=out";"$2} 
    $1=="dpPoolID"{print name out";"$2; out=""}' tmp2.txt
devtwr1;00:67:BB;104,857,600;4,042,752;20
devtwr1;00:7B:FD;52,428,800;14,880,768;10
devtwr1;00:7C:28;34,179,712;29,804,544;20
devtwr1;00:7C:29;34,179,712;5,462,016;20
devtwr2;00:67:BB;104,857,600;4,042,752;20
devtwr2;00:7B:FD;52,428,800;14,880,768;10
devtwr2;00:7C:28;34,179,712;29,804,544;20
devtwr2;00:7C:29;34,179,712;5,462,016;20

And here is a Perl solution but once again couldn't work out how to add additional fields like the"name"field that dosn't always have data.

perl -lane '
BEGIN {
  @fields = qw(name displayName capacityInKB consumedCapacityInKB dpPoolID);
  print join ";", @fields;
}
if (/^(name|displayName)/ && $data{displayName}) {
  print join ";", @data{@fields};
  %data = ( name => $data{name} );
}
$data{$F[0]} = $F[1];
END {
  print join ";",@data{@fields};
}' tmp2.txt > Report.csvv

【回答】

按照第1列是否等于"name"将数据分为2N组(Nname的数量),然后从奇数组的单条记录取出name,从偶数组的M条记录中拼出M/4条记录。上述算法比较直观,但涉及有序计算,用perl实现很麻烦,建议用SPL来实现,简单很多:


A

B

C

1

=file("/Ntab.txt").cursor()

>file("/Result.txt").write("name;displayName;capacityInKB;consumedCapacityInKB;dpPoolID"

2

for   A1;#1=="name"

if   A2.#1=="name"

=A2.#2

3


=A2.(#2).to(2,).group((#-1)\4).new(C2,~(1),~(2),~(3),~(4))


4


>file("/Result.txt").export@a(B3;";")


A1:以游标方式读入文件,默认分隔符是tab。文件大小不受限制。

B1:在第一行写表头。

A2:循环A1,按照第1列是否等于"name"分组读入数据,奇数次循环读入的是

"name   devtwr1""name devtwr2",偶数次循环读入的是剩余数据。

B2-C2:如果循环变量的第1个字段等于name,则将第2个字段临时存入C2(比如devtwr1)待用,然后执行B3-B4

B3:取出A2的第2列(此时为多值),按序号分组,并将每组拼成name相同的记录。A2.(#2)表示A2的第2列,group中的#表示序号

B4:B3追加写入文件。