从 TPCH 测试学习性能优化技巧之 Q22

一、     查询要求

Q22 语句查询可能会购买的消费者的地理分布。本查询统计在指定国家编码范围内,各国有多少消费者近 7 年没有下过订单,但帐户余额大于所有正余额帐户的平均值。列出国家编号、消费者数量、帐户总余额。

Q22语句的特点是:带有分组、排序、聚集、子查询、NOT EXISTS子查询操作并存的两表连接操作。

 

二、     Oracle执行

Oracle编写的查询SQL语句如下:

select  /*+ parallel(n) */

         cntrycode,

         count(*) as numcust,

         sum(c_acctbal) as totacctbal

from

         (

                   select

                            substr(c_phone,1,2) as cntrycode,

                            c_acctbal

                   from

                            customer

                   where

                            substr(c_phone,1,2) in

                                     ('11', '14', '15', '19', '20', '21', '23')

                            and c_acctbal > (

                                     select

                                               avg(c_acctbal)

                                     from

                                               customer

                                     where

                                               c_acctbal > 0.00

                                               and substr(c_phone,1,2) in

                                                        ('11', '14', '15', '19', '20', '21', '23')

                            )

                            and not exists (

                                     select

                                               *

                                     from

                                               orders

                                     where

                                               o_custkey = c_custkey

                            )

         ) custsale

group by

         cntrycode

order by

         cntrycode;

其中/*+ parallel(n) */ Oracle的并行查询语法,n是并行数。

脚本执行时间,单位:秒

并行数

1

2

4

8

12

Oracle

128

86

70

53

48

 

三、     SPL优化

这个查询的外层是个常规的分组聚合,其主要复杂度在于内层的两个用于条件的子查询。这两个子查询能对应一些满足条件的customer记录,而后一个子查询是用了not exist修饰,最后结果应当是这两个子查询对应的customer记录集合的差集。

 

SPL脚本如下:


A

1

=now()

2

=["11","14","15","19","20","21","23"]

3

=file("customer.ctx").open().cursor@m(C_CUSTKEY,C_PHONE,C_ACCTBAL;C_ACCTBAL>0.0   && A2.contain(left(C_PHONE,2))).fetch()

4

=A3.avg(C_ACCTBAL)

5

=A3.select(C_ACCTBAL>A4).derive@o().keys@i(C_CUSTKEY)

6

=file("orders.ctx").open().cursor@m(O_CUSTKEY;A5.find(O_CUSTKEY))

7

=A6.groups(O_CUSTKEY:C_CUSTKEY)

8

=[A5,A7].merge@d(C_CUSTKEY)

9

=A8.groups(left(C_PHONE,2):cntrycode;count(1):numcust,   sum(C_ACCTBAL):totacctbal)

10

return interval@ms(A1,now())

SPL中分组计算结果是有序的,即A7会对C_CUSTKEY有序,而A5本身来自customer表,也对C_CUSTKEY有序,两个有序集可以使用归并算法高速计算差集(A8中)。

 

脚本执行时间,单位:秒

并行数

1

2

4

8

12

Oracle

128

86

70

53

48

SPL组表

102

53

29

24

19