从 TPCH 测试学习性能优化技巧之 Q20

一、     查询要求

Q20语句查询确定在某一年内,找出指定国家的能对某一零件商品提供更有竞争力价格的供货商。所谓更有竞争力的供货商,是指那些零件有过剩的供货商,超过供货商在某一年中货运给定国的某一零件的50%则为过剩。

Q20语句的特点是:带有排序、聚集、IN子查询、普通子查询操作并存的两表连接操作。

 

二、     Oracle执行

Oracle编写的查询SQL语句如下:

select  /*+ parallel(n) */

         s_name,s_address

from

         supplier,nation

where

         s_suppkey in (

                   select

                            ps_suppkey

                   from

                            partsupp

                   where

                            ps_partkey in (

                                     select

                                               p_partkey

                                     from

                                               part

                                     where

                                               p_name like 'bisque%'

                            )

                            and ps_availqty > (

                                     select

                                               0.5 * sum(l_quantity)

                                     from

                                               lineitem

                                     where

                                               l_partkey = ps_partkey

                                               and l_suppkey = ps_suppkey

                                               and l_shipdate >= date '1995-01-01'

                                               and l_shipdate < date '1995-01-01' + interval '1' year

                            )

         )

         and s_nationkey = n_nationkey

         and n_name = 'CHINA'

order by

         s_name;

其中/*+ parallel(n) */ Oracle的并行查询语法,n是并行数。

脚本执行时间,单位:秒

并行数

1

2

4

8

12

Oracle

437

307

242

201

175

 

三、     SPL优化

这个查询看起来很复杂,嵌套了很多层,这主要是因为SQL不提倡分步运算导致的。如果把它分成几步来看,情况就比较清楚:

1.       按条件过滤nation

2.       按条件过滤part

3.       1的结果当作外键表去匹配过滤supplier

4.       23的结果当作外键表去匹配过滤partsupp

5.       4的结果当作外键表去与lineitem表连接,并将PS_AVAILQTY字段选出,然后按L_PARTKEYL_SUPPKEY分组聚合,前面说过,用等值条件与主表关联的子查询都可以改写成分组后再与主表连接的语句。分组聚合后选出满足条件的L_SUPPKEY

6.       再用5的结果作为外键表去过滤supplier

整个过程就是不断地生成中间外键表去做匹配过滤。

 

SPL脚本如下:


A

1

=1

2

=now()

3

>date=date("1995-01-01")

4

>partname="bisque*"

5

>nationname="CHINA"

6

=elapse@y(date,1)

7

=file(path+"nation.ctx").create().cursor(N_NATIONKEY;N_NAME   == nationname).fetch().keys@i(N_NATIONKEY)

8

=file(path+"part.ctx").create().cursor@m(P_PARTKEY;like(P_NAME,partname);A1).fetch().keys@i(P_PARTKEY)

9

=file(path+"supplier.ctx").create().cursor@m(S_SUPPKEY,S_NAME,S_ADDRESS;A7.find(S_NATIONKEY);A1)

10

=A9.fetch().keys@i(S_SUPPKEY)

11

=file(path+"partsupp.ctx").create().cursor@m(PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY;PS_PARTKEY:A8,PS_SUPPKEY:A10;A1)

12

=A11.fetch()

13

=file(path+"lineitem.ctx").create().cursor@m(L_PARTKEY,L_SUPPKEY,L_QUANTITY;L_SHIPDATE   >= date && L_SHIPDATE < A6;A1)

14

=A13.join@i(L_PARTKEY:L_SUPPKEY,A12:PS_PARTKEY.P_PARTKEY:PS_SUPPKEY.S_SUPPKEY,PS_AVAILQTY)

15

=A14.groups@u(L_PARTKEY,L_SUPPKEY,PS_AVAILQTY;sum(L_QUANTITY):quantity)

16

=A15.select(PS_AVAILQTY   > 0.5 * quantity).id(L_SUPPKEY)

17

=A10.switch@i(S_SUPPKEY,A16)

18

=A17.new(S_NAME,S_ADDRESS).sort@o(S_NAME)

19

=now()

20

=interval@s(A2,A19)

 

脚本执行时间,单位:秒

并行数

1

2

4

8

12

Oracle

437

307

242

201

175

SPL组表

128

65

36

21

15