从复杂格式的文本文件中提取数据
例题描述和简单分析
有大量的txt文件,每个txt约几千行,其中包含部分数据如下所示:
...
23:50:55.818> setconfig,FF:FF:FF:FF:FF:FF,116080121072980303073,Tue Aug 03 23:50:55 2021,,,,,,,,,-999,-999,-999,-999,-999,-999,00.515,0,0,0,Pass
...
23:51:00.625> WL_DUT0 manfinfo output: OTP_LOT_NUM: 0x1082
WAFER_NUM: 0xd
WAFER_X: 0x16
WAFER_Y: 0x25
...
数据提取要求(以上数据为例):
1. 提取第一次出现setconfig, (文本中多次出现), 后面的116080121072980303073
2. 提取第一次出现WL_DUT0 manfinfo output: OTP_LOT_NUM:后面的0x1082, 及下面三行后的0xd,0x16,0x25
3. 输出excel为116080121072980303073,0x1082,0xd,0x16,0x25, 5列显示,结果如下所示:
A |
B |
C |
D |
E |
|
1 |
SN |
OTP_LOT_NUM |
WAFER_NUM |
WAFER_X |
WAFER_Y |
2 |
21072980303051 |
0x1082 |
0xd |
0x17 |
0x1c |
3 |
72980303054 |
0x1082 |
0xd |
0x13 |
0x1d |
4 |
116080121072980303073 |
0x1082 |
0xd |
0x16 |
0x25 |
… |
… |
… |
… |
… |
… |
解法及简要说明
在集算器中编写脚本p1.dfx,如下所示:
A |
B |
|
1 |
=create(SN,OTP_LOT_NUM,WAFER_NUM,WAFER_X,WAFER_Y) |
|
2 |
=directory@p("*.txt").(file(~)) |
|
3 |
for A2 |
=A3.import@i() |
4 |
=B3.select@1(pos(~,"setconfig,")).split@c()(3) |
|
5 |
=B3.pselect@1(pos(~,"WL_DUT0 manfinfo output:")) |
|
6 |
=B3.m(B5:B5+3).(~.split(":").m(-1)) |
|
7 |
=A1.record(B4|B6) |
|
8 |
=file("result.xlsx").xlsexport@t(A1) |
简要说明:
A1 建空序表(SN,OTP_LOT_NUM,WAFER_NUM,WAFER_X,WAFER_Y)
A2 列出满足通配符路径的文件名
A3 循环A2
B3 txt返回成序列
B4 计算满足要求1的SN值
B5 计算要求2的位置
B6 取要求2的四行数据,计算要求2的各值
B7 序表A1追加记录
A8 结果导出至result.xlsx
https://club.excelhome.net/thread-1594992-1-1.html