从多层 xml 文件提取数据
例题描述和简单分析
有xml 文件 invoice.xml,部分数据如下所示:
…
<LIST_G_AMOUNT_DUE>
<G_AMOUNT_DUE>
<SORT_TRX_SEQUENCE>1</SORT_TRX_SEQUENCE>
<SORT_TRX_DATE>28-JUL-14</SORT_TRX_DATE>
<SORT_INVOICE_NUMBER>111820</SORT_INVOICE_NUMBER>
<SORT_DUE_DATE>27-AUG-14</SORT_DUE_DATE>
<PS_SEQUENCE>1493092</PS_SEQUENCE>
<TRX_SEQUENCE>1712368</TRX_SEQUENCE>
<RECEIPT_CURRENCY_CODE></RECEIPT_CURRENCY_CODE>
<AMOUNT_APPLIED_FROM></AMOUNT_APPLIED_FROM>
<LIST_G_LINE_CLUSTER>
<G_LINE_CLUSTER>
<INVOICE_NUMBER>111820</INVOICE_NUMBER>
<TRX_DATE>28-JUL-14</TRX_DATE>
<TRANSACTION>Invoice</TRANSACTION>
<DUE_DATE>27-AUG-14</DUE_DATE>
<REFERENCE>SAMPLE SALE </REFERENCE>
<BILL_TO_LOCATION>WASHINGTON</BILL_TO_LOCATION>
<LINE_CUSTOMER_ID>4382</LINE_CUSTOMER_ID>
<GENERAL_SEQUENCE>9648082</GENERAL_SEQUENCE>
<AMOUNT_DUE></AMOUNT_DUE>
<TRX_AMOUNT>64.4</TRX_AMOUNT>
<CD_TRX_AMOUNT> 64.40 </CD_TRX_AMOUNT>
<DUMMY_REFERENCE>SAMPLE SALE </DUMMY_REFERENCE>
<C_BILL_TO_LOC>0</C_BILL_TO_LOC>
</G_LINE_CLUSTER>
<G_LINE_CLUSTER>
<INVOICE_NUMBER>111820</INVOICE_NUMBER>
<TRX_DATE>19-OCT-15</TRX_DATE>
…
需要从该多层xml 中提取数据,部分结果如下表所示:
SORT_INVOICE_NUMBER |
TRANSACTION |
SORT_DUE_DATE |
TRX_DATE |
TRX_AMOUNT |
111820 |
Invoice |
27-AUG-14 |
28-JUL-14 |
64.4 |
111820 |
Payment |
27-AUG-14 |
19-OCT-15 |
-64.4 |
1100585 |
Invoice |
30-JUL-15 |
30-JUN-15 |
69.4 |
1100585 |
Payment |
30-JUL-15 |
05-AUG-15 |
-16.73 |
1100585 |
Payment |
30-JUL-15 |
09-SEP-15 |
-52.2 |
1100585 |
Payment |
30-JUL-15 |
19-OCT-15 |
-0.47 |
1101491 |
Invoice |
05-AUG-15 |
06-JUL-15 |
69.4 |
1101491 |
Payment |
05-AUG-15 |
19-OCT-15 |
-69.4 |
… |
… |
… |
… |
… |
解法及简要说明
在集算器中编写脚本p1.dfx,如下所示:
A |
|
1 |
=file("invoice.xml").read() |
2 |
=xml(A1,"G_STATEMENT/LIST_G_AMOUNT_DUE/G_AMOUNT_DUE") |
3 |
=A2.news(if(ifa(LIST_G_LINE_CLUSTER.G_LINE_CLUSTER),LIST_G_LINE_CLUSTER.G_LINE_CLUSTER,[LIST_G_LINE_CLUSTER.G_LINE_CLUSTER]);SORT_INVOICE_NUMBER,~.TRANSACTION,SORT_DUE_DATE,~.TRX_DATE,~.TRX_AMOUNT) |
简要说明:
A1 xml 读成串
A2 解析 A1 的 xml 串,取出 G_STATEMENT/LIST_G_AMOUNT_DUE/G_AMOUNT_DUE 层的内容,返回序表
A3 按指定列,展开多层序表,得到结果。需要注意的是 LIST_G_LINE_CLUSTER.G_LINE_CLUSTER 可能为排列,需要判断后转成序列
BIRT 集成这段代码的方法可参考:《BIRT 如何调用 SPL 脚本》。
https://www.eclipse.org/forums/index.php/t/1076017/
英文版