文本与 JSON 做 JOIN 关联
【问题】
I have a tab-delimited textfile A (representing a BLAST output)
Name1BBBBBBBBBBBB 99.40 166 1 0 1 166 334 499 3e-82 302
Name2DDDDDDDDDDDD 98.80 167 2 0 1 167 346 512 4e-81 298
and a textfile B (representing a phylogenetic dendrogram) looking like
{
"member": {
"Cluster A": "BBBBBBBBBBBB This is Animal A",
},
"name": "Cluster A"
},
{
"member": {
"Cluster B": "DDDDDDDDDDDD This is Animal B"
},
"name": "cluster B"
}
I want to take the string found in the 2nd tab of textfile A (DDDDDDDDDDDD for example) and look it up in text file B. The script should then add the info found in textfile B into a new tab of textfile A :
Name1BBBBBBBBBBBB 99.40 166 1 0 1 166 334 499 3e-82 302 Cluster A This is Animal A
Name2DDDDDDDDDDDD 98.80 167 2 0 1 167 346 512 4e-81 298 Cluster B This is A
【回答】
如果把源数据看做两张表,那你的问题可以用 SQL 中的 join 语句来解决,不过 perl 或 shell 没有直接提供这种功能,自行编写代码很复杂。可以考虑用集算器来简化,SPL 代码如下:
A |
|
1 |
=json(file("json.txt").read()) |
2 |
=A1.new(#1.name:name,#1.(#1):cluster,(firstblank=pos(cluster," "),left(cluster,firstblank-1)):key,right(cluster,len(cluster)-firstblank):value) |
3 |
=file("file.txt").import() |
4 |
=join(A3,_2;A2,key).new(_1._1,_1._2,_1._3,_1._4,_1._5,_1._6,_1._7,_1._8,_2.name,_2.value) |
A1:读取 json 文本
A2:将 A1 中的记录做拆分,生成新的序表
A3:读取 file.txt
A4:对 A2 和 A3 做叉乘,获取最终结果