性能优化案例课程 TPCH-Q12
select
l_shipmode,
sum(case
when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1
else 0 end) as high_line_count,
sum(case
when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1
else 0 end) as low_line_count
from
orders,
lineitem
where
o_orderkey = l_orderkey
and l_shipmode in ('TRUCK', 'MAIL')
and l_commitdate < l_receiptdate
and l_shipdate < l_commitdate
and l_receiptdate >= date '1995-01-01'
and l_receiptdate < date '1995-01-01' + interval '1' year
group by
l_shipmode
order by
l_shipmode;
这是主表 orders 与子表 lineitem 关联过滤后的分组聚合计算。
1. 数据存储
大表 orders、lineitem 按照主键 orderkey 有序存储,连接的时候可以有序归并。
继续使用题目 Q3 中的 orders.ctx、lineitem.ctx。
将这些表复制到本题的主目录中。
2. 一般实现
A |
|
1 |
=now() |
2 |
1995-1-1 |
3 |
=elapse@y(A2, 1) |
4 |
=["MAIL", "TRUCK"] |
5 |
=file("lineitem.ctx").open().cursor@m(L_ORDERKEY,L_SHIPMODE;L_RECEIPTDATE>=A2 && L_RECEIPTDATE<A3 && A4.contain(L_SHIPMODE) && L_COMMITDATE<L_RECEIPTDATE && L_SHIPDATE<L_COMMITDATE) |
6 |
=file("orders.ctx").open().new@r(A5,L_SHIPMODE,O_ORDERPRIORITY) |
7 |
=A6.run(O_ORDERPRIORITY=if(O_ORDERPRIORITY=="1-URGENT" || O_ORDERPRIORITY=="2-HIGH",1,0)) |
8 |
=A7.groups(L_SHIPMODE;sum(O_ORDERPRIORITY):high_line_count, sum(1-O_ORDERPRIORITY):low_line_count) |
9 |
=interval@ms(A1,now()) |
代码中用到了前面讲到过的游标前过滤、多线程并行等优化方法。
high_line_count 和 low_line_count 的条件正好相反,在没有求和的时候,前者是 1 时后者是 0,前者是 0 时后者是 1,所以 A7、A8 中用 O_ORDERPRIORITY 和 1- O_ORDERPRIORITY 来计算求和之前的这两个值。
测试结果:
测试项目 |
执行时间(秒) |
一般实现 |
6 |
3. 数据变换
数据变换要采用前面题目提到的日期整数化、枚举型字符串字段数字化手段。
可以直接使用以前的题目 Q4 转换的 orders_4.ctx。
将这些表复制到本题的主目录中。
Lineitem 要在 Q3 的基础上,对枚举型字符串字段 l_shipmod 完成数字化。
数据转换代码:
A |
|
1 |
=file("lineitem.ctx").open() |
2 |
=A1.cursor(L_SHIPMODE) |
3 |
=A2.id(L_SHIPMODE).sort() |
4 |
=file("l_shipmode.btx").export@b(A3) |
5 |
=A1.cursor() |
6 |
=A5.run(L_SHIPMODE=A3.pos@b(L_SHIPMODE),L_RECEIPTDATE=days@o(L_RECEIPTDATE),L_COMMITDATE=days@o(L_COMMITDATE),L_SHIPDATE=days@o(L_SHIPDATE)) |
7 |
=file("lineitem_3.ctx").create@py(#L_ORDERKEY,#L_LINENUMBER,L_PARTKEY, L_SUPPKEY, L_QUANTITY, L_EXTENDEDPRICE,L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE,L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT) |
8 |
>A7.append(A6) |
计算代码:
A |
|
1 |
=now() |
2 |
>l_shipmode=file("l_shipmode.btx").import@b().(_1) |
3 |
>o_orderpriority=file("o_orderpriority.btx").import@b().(_1) |
4 |
1995-1-1 |
5 |
=days@o(elapse@y(A4, 1)) |
6 |
=days@o(A4) |
7 |
=l_shipmode.(["MAIL", "TRUCK"].contain(~)) |
8 |
=o_orderpriority.(["1-URGENT","2-HIGH"].contain(~)) |
9 |
=file("lineitem_3.ctx").open().cursor@m(L_ORDERKEY,L_SHIPMODE;L_RECEIPTDATE>=A6 && L_RECEIPTDATE<A5 && A7(L_SHIPMODE) && L_COMMITDATE<L_RECEIPTDATE && L_SHIPDATE<L_COMMITDATE) |
10 |
=file("orders_4.ctx").open().new@r(A9,L_SHIPMODE,O_ORDERPRIORITY) |
11 |
=A10.run(O_ORDERPRIORITY=A8(O_ORDERPRIORITY)) |
12 |
=A11.groups(L_SHIPMODE;count(O_ORDERPRIORITY):high_line_count, count(!O_ORDERPRIORITY):low_line_count) |
13 |
=A12.run(L_SHIPMODE=l_shipmode(L_SHIPMODE)) |
14 |
=interval@ms(A1,now()) |
A7、A8 是对位序列。
A11 中的 A8(O_ORDERPRIORITY) 返回的是 true 或者 false,对应原来的 1、0,A12 中的 count 和原来的 sum 结果相同。这就省去了 if 判断,可以提高性能。
测试结果:
测试项目 |
执行时间(秒) |
一般实现 |
6 |
数据变换 |
4 |
4. 列式计算
A |
|
1 |
=now() |
2 |
>l_shipmode=file("l_shipmode.btx").import@b().(_1) |
3 |
>o_orderpriority=file("o_orderpriority.btx").import@b().(_1) |
4 |
1995-1-1 |
5 |
=days@o(elapse@y(A4, 1)) |
6 |
=days@o(A4) |
7 |
=l_shipmode.(["MAIL", "TRUCK"].contain(~)) |
8 |
=o_orderpriority.(["1-URGENT","2-HIGH"].contain(~)) |
9 |
=file("lineitem_3.ctx").open().cursor@mv(L_ORDERKEY,L_SHIPMODE;L_RECEIPTDATE>=A6 && L_RECEIPTDATE<A5 && A7(L_SHIPMODE) && L_COMMITDATE<L_RECEIPTDATE && L_SHIPDATE<L_COMMITDATE) |
10 |
=file("orders_4.ctx").open().new@r(A9,L_SHIPMODE,O_ORDERPRIORITY) |
11 |
=A10.derive@o(A8(O_ORDERPRIORITY):flag) |
12 |
=A11.groups(L_SHIPMODE;count(flag):high_line_count,count(!flag):low_line_count) |
13 |
=A12.new(l_shipmode(L_SHIPMODE):L_SHIPMODE,high_line_count,low_line_count) |
14 |
=interval@ms(A1,now()) |
列式计算不能给字段赋值,所以 A11 中新定义了一个字段 flag。
测试结果:
测试项目 |
执行时间(秒) |
一般实现 |
6 |
数据变换 |
4 |
列式计算 |
2 |