用 TPCH 练习性能优化 Q12

一、 SQL及分析

查询SQL语句如下:

select
    l_shipmode,
    sum(case
        when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1
        else 0 end) as high_line_count,
    sum(case
        when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1
        else 0 end) as low_line_count
from
    orders,
    lineitem
where
    o_orderkey = l_orderkey
    and l_shipmode in ('TRUCK', 'MAIL')
    and l_commitdate < l_receiptdate
    and l_shipdate < l_commitdate
    and l_receiptdate >= date '1995-01-01'
    and l_receiptdate < date '1995-01-01' + interval '1' year
group by
    l_shipmode
order by
    l_shipmode;

这是主表orders与子表lineitem关联过滤后的分组聚合计算。

二、 SPL实现

这里的orderslineitem主子表关联优化原理与Q9中类似。

注意到SQLhigh_line_countlow_line_count的条件正好相反,其中一个可以被另一个用更简单的计算式计算出来。


A
1 =now()
2 1995-1-1
3 =elapse@y(A2, 1)
4 =["MAIL", "TRUCK"]
5 =file("lineitem.ctx").open().cursor@m(L_ORDERKEY,L_SHIPMODE;L_RECEIPTDATE>=A2 && L_RECEIPTDATE<A3 && A4.contain(L_SHIPMODE) && L_COMMITDATE<L_RECEIPTDATE && L_SHIPDATE<L_COMMITDATE)
6 =file("orders.ctx").open().new@r(A5,L_SHIPMODE,O_ORDERPRIORITY)
7 =A6.run(O_ORDERPRIORITY=if(O_ORDERPRIORITY=="1-URGENT" || O_ORDERPRIORITY=="2-HIGH",1,0))
8 =A7.groups(L_SHIPMODE;sum(O_ORDERPRIORITY):high_line_count, sum(1-O_ORDERPRIORITY):low_line_count)
9 =interval@ms(A1,now())

三、 进一步优化

1. 优化方法

本例中要使用Q1中介绍的日期整数化方法,lineitem中的L_SHIPDATE L_COMMITDATEL_RECEIPTDATE都已经在前面例子中转换过了。还要使用 Q1中介绍的字符串整数化方法,orders中的O_ORDERPRIORITY已经在前面例子中转换过了,本例还要对lineitem中的L_SHIPMODE进行转换

2. 数据转换代码

2.1 orders转换

复制 orders_10.ctx重命名为orders_12.ctx

2.2 lineitem转换


A
1 =file("lineitem.ctx").open().cursor(L_SHIPMODE)
2 =A1.id(L_SHIPMODE).sort()
3 =file("l_shipmode.txt").export(A2)
4 =file("lineitem_10.ctx").open().cursor()
5 =A4.run(L_SHIPMODE=A2.pos@b(L_SHIPMODE))
6 =file("lineitem_12.ctx").create@p(#L_ORDERKEY,#L_LINENUMBER,L_PARTKEY, L_SUPPKEY, L_QUANTITY, L_EXTENDEDPRICE,L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE,L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT)
7 >A6.append(A5)

3. 数据转换后的计算代码

计算代码:


A
1 >l_shipmode=file("l_shipmode.txt").import@si()
2 >o_orderpriority=file("o_orderpriority.txt").import@si()
3 =now()
4 1995-1-1
5 =days@o(elapse@y(A4, 1))
6 =days@o(A4)
7 =l_shipmode.(["MAIL", "TRUCK"].contain(~))
8 =o_orderpriority.(["1-URGENT","2-HIGH"].contain(~))
9 =file("lineitem_12.ctx").open().cursor@m(L_ORDERKEY,L_SHIPMODE;L_RECEIPTDATE>=A6 && L_RECEIPTDATE<A5 && A7(L_SHIPMODE) && L_COMMITDATE<L_RECEIPTDATE && L_SHIPDATE<L_COMMITDATE)
10 =file("orders_12.ctx").open().new@r(A9,L_SHIPMODE,O_ORDERPRIORITY)
11 =A10.run(O_ORDERPRIORITY=if(A8(O_ORDERPRIORITY),1,0))
12 =A11.groups(L_SHIPMODE;sum(O_ORDERPRIORITY):high_line_count, sum(1-O_ORDERPRIORITY):low_line_count)
13 =A12.run(L_SHIPMODE=l_shipmode(L_SHIPMODE))
14 =interval@ms(A3,now())

A13中将整数化的L_SHIPMODE还原为原字符串。

四、 使用企业版列式计算

1. 原始数据


A
1 =now()
2 1995-1-1
3 =elapse@y(A2, 1)
4 =["MAIL", "TRUCK"]
5 =file("lineitem.ctx").open().cursor@mv(L_ORDERKEY,L_SHIPMODE;L_RECEIPTDATE>=A2 && L_RECEIPTDATE<A3 && A4.contain(L_SHIPMODE) && L_COMMITDATE<L_RECEIPTDATE && L_SHIPDATE<L_COMMITDATE)
6 =file("orders.ctx").open().new@r(A5,L_SHIPMODE,O_ORDERPRIORITY)
7 =A6.derive@o(if(O_ORDERPRIORITY=="1-URGENT" || O_ORDERPRIORITY=="2-HIGH",1,0):flag)
8 =A7.groups(L_SHIPMODE;sum(flag):high_line_count, sum(1-flag):low_line_count)
9 =interval@ms(A1,now())

2. 优化后数据


A
1 >l_shipmode=file("l_shipmode.txt").import@si()
2 >o_orderpriority=file("o_orderpriority.txt").import@si()
3 =now()
4 1995-1-1
5 =days@o(elapse@y(A4, 1))
6 =days@o(A4)
7 =l_shipmode.(["MAIL", "TRUCK"].contain(~))
8 =o_orderpriority.(["1-URGENT","2-HIGH"].contain(~))
9 =file("lineitem_12.ctx").open().cursor@mv(L_ORDERKEY,L_SHIPMODE;L_RECEIPTDATE>=A6 && L_RECEIPTDATE<A5 && A7(L_SHIPMODE) && L_COMMITDATE<L_RECEIPTDATE && L_SHIPDATE<L_COMMITDATE)
10 =file("orders_12.ctx").open().new@r(A9,L_SHIPMODE,O_ORDERPRIORITY)
11 =A10.derive@o(if(A8(O_ORDERPRIORITY),1,0):flag)
12 =A11.groups(L_SHIPMODE;sum(flag):high_line_count,sum(1-flag):low_line_count)
13 =A12.new(l_shipmode(L_SHIPMODE):L_SHIPMODE,high_line_count,low_line_count)
14 =interval@ms(A3,now())

五、 测试结果

单位:秒


常规 列式
优化前 9.9 7.4
优化后 6.4 3.5