"【问题】 using the below awk and Perl solutions: I need to understand how to add additional field th .."

swan
离开的是风景，留下的是人生
633 浏览 • 4 年前

按标记分组再结构化

Excel处理

【问题】

using the below awk and Perl solutions: I need to understand how to add additional field that don't repeat their output much like"name"in these examples. ie devtwr1 only occurs once in the sample data, If i need to add additional fields that also only occur once, how is this done? Sorry Tried extensively but can't work it out...

here is the source:

have a file: Ntab.txt which is a sample file of two hosts, there is numerous hosts in the real file one after each other.

Each host has multiple "displayName"s (addresses) with corresponding figures for each 'displayName'.

>cat Ntab.txt

name    devtwr1
displayName     00:67:BB
capacityInKB    104,857,600
consumedCapacityInKB    4,042,752
dpPoolID        20
displayName     00:7B:FD
capacityInKB    52,428,800
consumedCapacityInKB    14,880,768
dpPoolID        10
displayName     00:7C:28
capacityInKB    34,179,712
consumedCapacityInKB    29,804,544
dpPoolID        20
displayName     00:7C:29
capacityInKB    34,179,712
consumedCapacityInKB    5,462,016
dpPoolID        20
name    devtwr2
displayName     00:67:BB
capacityInKB    104,857,600
consumedCapacityInKB    4,042,752
dpPoolID        20
displayName     00:7B:FD
capacityInKB    52,428,800
consumedCapacityInKB    14,880,768
dpPoolID        10
displayName     00:7C:28
capacityInKB    34,179,712
consumedCapacityInKB    29,804,544
dpPoolID        20
displayName     00:7C:29
capacityInKB    34,179,712
consumedCapacityInKB    5,462,016
dpPoolID        20

I need to be able to produce the $2 data from the rows into columns after each 'name' (host) and in csv type format, headings optional. I cant use (,:) as separators as the data includes them so either (tab or ;).

Like:

name;displayName;capacityInKB;consumedCapacityInKB;dpPoolID        
devtwr1;00:67:BB;104,857,600;4,042,752;20
devtwr1;00:7B:FD;52,428,800;14,880,768;10
devtwr1;00:7C:28;34,179,712;29,804,544;20
devtwr1;00:7C:29;34,179,712;5,462,016;20
devtwr2;00:67:BB;104,857,600;4,042,752;20
devtwr1;00:7B:FD;52,428,800;14,880,768;10
devtwr1;00:7C:28;34,179,712;29,804,544;20
devtwr1;00:7C:29;34,179,712;5,462,016;20

Here is a Awk solution but cant work out how to add additional field like "name" that dosn't have data in all fields.

$ awk '$1=="name"{name=$2}
    $1 ~/^(displayName|capacityInKB|consumedCapacityInKB)$/{out=out";"$2} 
    $1=="dpPoolID"{print name out";"$2; out=""}' tmp2.txt
devtwr1;00:67:BB;104,857,600;4,042,752;20
devtwr1;00:7B:FD;52,428,800;14,880,768;10
devtwr1;00:7C:28;34,179,712;29,804,544;20
devtwr1;00:7C:29;34,179,712;5,462,016;20
devtwr2;00:67:BB;104,857,600;4,042,752;20
devtwr2;00:7B:FD;52,428,800;14,880,768;10
devtwr2;00:7C:28;34,179,712;29,804,544;20
devtwr2;00:7C:29;34,179,712;5,462,016;20

And here is a Perl solution but once again couldn't work out how to add additional fields like the"name"field that dosn't always have data.

perl -lane '
BEGIN {
  @fields = qw(name displayName capacityInKB consumedCapacityInKB dpPoolID);
  print join ";", @fields;
}
if (/^(name|displayName)/ && $data{displayName}) {
  print join ";", @data{@fields};
  %data = ( name => $data{name} );
}
$data{$F[0]} = $F[1];
END {
  print join ";",@data{@fields};
}' tmp2.txt > Report.csvv

【回答】

按照第1列是否等于"name"将数据分为2N组（N是name的数量），然后从奇数组的单条记录取出name，从偶数组的M条记录中拼出M/4条记录。上述算法比较直观，但涉及有序计算，用perl实现很麻烦，建议用SPL来实现，简单很多：

	A	B	C
1	=file("/Ntab.txt").cursor()	>file("/Result.txt").write("name;displayName;capacityInKB;consumedCapacityInKB;dpPoolID"
2	for A1;#1=="name"	if A2.#1=="name"	=A2.#2
3		=A2.(#2).to(2,).group((#-1)\4).new(C2,~(1),~(2),~(3),~(4))
4		>file("/Result.txt").export@a(B3;";")

A1:以游标方式读入文件，默认分隔符是tab。文件大小不受限制。

B1:在第一行写表头。

A2:循环A1，按照第1列是否等于"name"分组读入数据，奇数次循环读入的是

"name devtwr1"和"name devtwr2",偶数次循环读入的是剩余数据。

B2-C2:如果循环变量的第1个字段等于name，则将第2个字段临时存入C2（比如devtwr1）待用，然后执行B3-B4。

B3:取出A2的第2列（此时为多值），按序号分组，并将每组拼成name相同的记录。A2.(#2)表示A2的第2列，group中的#表示序号

B4:将B3追加写入文件。

文本(31) 大数据(5) 有序计算(29)

按标记分组再结构化

【问题】

【回答】

目录