从 TPCH 测试学习性能优化技巧之 Q19

一、     查询要求

Q19语句查询得到对一些空运或人工运输零件三个不同种类的所有订单的总折扣收入。零件的选择考虑特定品牌、包装和尺寸范围。

Q19语句的特点是:带有聚集、IN子查询操作并存的三表连接操作。

 

二、     Oracle执行

Oracle编写的查询SQL语句如下:

select  /*+ parallel(n) */

         sum(l_extendedprice * (1 - l_discount)) as revenue

from

         lineitem,part

where

         (

                   p_partkey = l_partkey

                   and p_brand = 'Brand#32'

                   and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')

                   and l_quantity >= 7 and l_quantity <= 7 + 10

                   and p_size between 1 and 5

                   and l_shipmode in ('AIR', 'AIR REG')

                   and l_shipinstruct = 'DELIVER IN PERSON'

         )

         or

         (

                   p_partkey = l_partkey

                   and p_brand = 'Brand#23'

                   and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')

                   and l_quantity >= 18 and l_quantity <= 18 + 10

                   and p_size between 1 and 10

                   and l_shipmode in ('AIR', 'AIR REG')

                   and l_shipinstruct = 'DELIVER IN PERSON'

         )

         or

         (

                   p_partkey = l_partkey

                   and p_brand = 'Brand#45'

                   and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')

                   and l_quantity >= 22 and l_quantity <= 22 + 10

                   and p_size between 1 and 15

                   and l_shipmode in ('AIR', 'AIR REG')

                   and l_shipinstruct = 'DELIVER IN PERSON'

         );

其中/*+ parallel(n) */ Oracle的并行查询语法,n是并行数。

脚本执行时间,单位:秒

并行数

1

2

4

8

12

Oracle

395

277

204

176

154

 

三、     SPL优化

本查询是个两表连接的过滤查询,结构上比较简单,但是过滤条件比较复杂。我们注意到,条件表达式中有不少项只与较小的表part相关,而且这些项的计算复杂度还较高(有in运算),如果先针对part表把这些项计算出来,则计算量将比针对两表连接结果集再计算要少得多,因为连续结果集是和lineitem表规模相同,要大得多。

 

SPL脚本如下:


A

1

=bx

2

=now()

3

>brand1="Brand#32"

4

>brand2="Brand#23"

5

>brand3="Brand#45"

6

>quantity1=7

7

>quantity2=18

8

>quantity3=22

9

=["SM   CASE", "SM BOX", "SM PACK", "SM PKG"]

10

=["MED   BAG", "MED BOX", "MED PKG", "MED PACK"]

11

=["LG   CASE", "LG BOX", "LG PACK", "LG PKG"]

12

=file(path+"part.ctx").create().cursor@m(P_PARTKEY,P_BRAND,P_SIZE,P_CONTAINER;P_SIZE>=1;A1)

13

=A12.derive(if(P_BRAND==brand1   && A9.contain(P_CONTAINER) && P_SIZE<=5:1,   P_BRAND==brand2 &&   A10.contain(P_CONTAINER) && P_SIZE<=10:2,    P_BRAND==brand3 && A11.contain(P_CONTAINER)   &&   P_SIZE<=15:3;0):flag).select(flag>0).fetch().derive@o().keys@i(P_PARTKEY)  

14

=file(path+"lineitem.ctx").create().cursor@m(L_PARTKEY,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT;   (L_SHIPMODE=="AIR" || L_SHIPMODE=="AIR REG") &&   L_SHIPINSTRUCT=="DELIVER IN PERSON" && L_QUANTITY >=   quantity1 && L_QUANTITY <= quantity3+10,L_PARTKEY:A13;A1)

15

=A14.select(case(L_PARTKEY.flag,1:L_QUANTITY<=quantity1+10,2:L_QUANTITY>=quantity2   && L_QUANTITY<=quantity2+10;L_QUANTITY>=quantity3))

16

=A15.total(sum(L_EXTENDEDPRICE*   (1 - L_DISCOUNT)))

17

=now()

18

=interval@s(A2,A17)

A13part表中新产生一个flag字段,用于将part相关的复杂条件项先计算出来。当记录满足SQL中的or条件123时分别赋值为123,都不满足则赋值0,然后用flag值是否大于0来过滤。在A15中再根据flag的值来选择用哪个quantity值对L_QUANTITY进行过滤。flag相关的复杂计算式只需要计算part表的行数次。

 

脚本执行时间,单位:秒

并行数

1

2

4

8

12

Oracle

395

277

204

176

154

SPL组表

158

80

40

21

15