从多层 xml 文件提取数据

例题描述和简单分析

xml 文件 invoice.xml,部分数据如下所示:

    <LIST_G_AMOUNT_DUE>

        <G_AMOUNT_DUE>

            <SORT_TRX_SEQUENCE>1</SORT_TRX_SEQUENCE>

            <SORT_TRX_DATE>28-JUL-14</SORT_TRX_DATE>

            <SORT_INVOICE_NUMBER>111820</SORT_INVOICE_NUMBER>

            <SORT_DUE_DATE>27-AUG-14</SORT_DUE_DATE>

            <PS_SEQUENCE>1493092</PS_SEQUENCE>

            <TRX_SEQUENCE>1712368</TRX_SEQUENCE>

            <RECEIPT_CURRENCY_CODE></RECEIPT_CURRENCY_CODE>

            <AMOUNT_APPLIED_FROM></AMOUNT_APPLIED_FROM>

            <LIST_G_LINE_CLUSTER>

                <G_LINE_CLUSTER>

                    <INVOICE_NUMBER>111820</INVOICE_NUMBER>

                    <TRX_DATE>28-JUL-14</TRX_DATE>

                    <TRANSACTION>Invoice</TRANSACTION>

                    <DUE_DATE>27-AUG-14</DUE_DATE>

                    <REFERENCE>SAMPLE SALE </REFERENCE>

                    <BILL_TO_LOCATION>WASHINGTON</BILL_TO_LOCATION>

                    <LINE_CUSTOMER_ID>4382</LINE_CUSTOMER_ID>

                    <GENERAL_SEQUENCE>9648082</GENERAL_SEQUENCE>

                    <AMOUNT_DUE></AMOUNT_DUE>

                    <TRX_AMOUNT>64.4</TRX_AMOUNT>

                    <CD_TRX_AMOUNT>        64.40 </CD_TRX_AMOUNT>

                    <DUMMY_REFERENCE>SAMPLE SALE </DUMMY_REFERENCE>

                    <C_BILL_TO_LOC>0</C_BILL_TO_LOC>

                </G_LINE_CLUSTER>

                <G_LINE_CLUSTER>

                    <INVOICE_NUMBER>111820</INVOICE_NUMBER>

                    <TRX_DATE>19-OCT-15</TRX_DATE>

需要从该多层xml 中提取数据,部分结果如下表所示:

SORT_INVOICE_NUMBER

TRANSACTION

SORT_DUE_DATE

TRX_DATE

TRX_AMOUNT

111820

Invoice

27-AUG-14

28-JUL-14

64.4

111820

Payment

27-AUG-14

19-OCT-15

-64.4

1100585

Invoice

30-JUL-15

30-JUN-15

69.4

1100585

Payment

30-JUL-15

05-AUG-15

-16.73

1100585

Payment

30-JUL-15

09-SEP-15

-52.2

1100585

Payment

30-JUL-15

19-OCT-15

-0.47

1101491

Invoice

05-AUG-15

06-JUL-15

69.4

1101491

Payment

05-AUG-15

19-OCT-15

-69.4

 

解法及简要说明

在集算器中编写脚本p1.dfx,如下所示:


A

1

=file("invoice.xml").read()

2

=xml(A1,"G_STATEMENT/LIST_G_AMOUNT_DUE/G_AMOUNT_DUE")

3

=A2.news(if(ifa(LIST_G_LINE_CLUSTER.G_LINE_CLUSTER),LIST_G_LINE_CLUSTER.G_LINE_CLUSTER,[LIST_G_LINE_CLUSTER.G_LINE_CLUSTER]);SORT_INVOICE_NUMBER,~.TRANSACTION,SORT_DUE_DATE,~.TRX_DATE,~.TRX_AMOUNT)

简要说明:

A1   xml 读成串

A2   解析 A1 的 xml 串,取出 G_STATEMENT/LIST_G_AMOUNT_DUE/G_AMOUNT_DUE 层的内容,返回序表

A3   按指定列,展开多层序表,得到结果。需要注意的是 LIST_G_LINE_CLUSTER.G_LINE_CLUSTER 可能为排列,需要判断后转成序列

BIRT 集成这段代码的方法可参考:BIRT 如何调用 SPL 脚本》

问答搜集

https://www.eclipse.org/forums/index.php/t/1076017/