如何对两个 Excel 表实现各种类型的 JOIN
某Excel中的sheet分为3类,其中sheet A是基础表,部分数据如下:
A |
B |
C |
|
1 |
interval1 |
interval2 |
interval3 |
2 |
1 hour |
1 day |
1 week |
3 |
2 hours |
2 days |
2 weeks |
4 |
3 hours |
3 days |
3 weeks |
5 |
4 hours |
4 days |
4 weeks |
Sheet B1\B2…Bn是关联表,它们的格式都一样,且与A有相同的列interval1、interval2、interval3。其中一个B的部分数据如下:
A |
B |
C |
D |
E |
F |
G |
|
1 |
interval1 |
interval2 |
interval3 |
Type |
value1 |
value2 |
value3 |
2 |
2 hours |
1 day |
7 week |
circle |
37 |
108.1 |
4.1 |
3 |
3 hours |
3 days |
7 weeks |
Line |
39 |
117.5 |
4.2 |
4 |
4 hours |
4 days |
7 weeks |
Line |
35 |
127 |
4.3 |
Sheet C用来描述A与B1\B2..Bn的Join类型,共3种,其中cross Join表示笛卡尔积;leftJoinBig表示左关联,关联列是interval1;leftJoinSmall也是左关联,关联列是interval1、interval2。部分数据如下:
A |
B |
|
1 |
table |
joinType |
2 |
B1 |
crossJoin |
3 |
B2 |
leftJoinBig |
4 |
B3 |
leftJoinSmall |
计算目标:按照sheet C中的join类型将sheet A和B1\B2..Bn关联起来,从A中取interva1列,从B取其他列,最后形成n个Excel文件。
以上面的sheet B为例(实际上每个B应当不同),如果joinType==crossJoin,则关联结果应该是:
A |
B |
C |
D |
E |
F |
G |
|
1 |
interval1 |
interval2 |
interval3 |
Type |
value1 |
value2 |
value3 |
2 |
1 hour |
1 day |
7 week |
Circle |
37 |
108.1 |
4.1 |
3 |
1 hour |
3 days |
7 weeks |
Line |
39 |
117.5 |
4.2 |
4 |
1 hour |
4 days |
7 weeks |
Line |
35 |
127 |
4.3 |
5 |
2 hours |
1 day |
7 week |
circle |
37 |
108.1 |
4.1 |
6 |
2 hours |
3 days |
7 weeks |
Line |
39 |
117.5 |
4.2 |
7 |
2 hours |
4 days |
7 weeks |
Line |
35 |
127 |
4.3 |
8 |
3 hours |
1 day |
7 week |
circle |
37 |
108.1 |
4.1 |
9 |
3 hours |
3 days |
7 weeks |
Line |
39 |
117.5 |
4.2 |
10 |
3 hours |
4 days |
7 weeks |
Line |
35 |
127 |
4.3 |
11 |
4 hours |
1 day |
7 week |
circle |
37 |
108.1 |
4.1 |
12 |
4 hours |
3 days |
7 weeks |
Line |
39 |
117.5 |
4.2 |
13 |
4 hours |
4 days |
7 weeks |
Line |
35 |
127 |
4.3 |
如果joinType==leftJoinBig,则关联结果应该是:
A |
B |
C |
D |
E |
F |
G |
|
1 |
interval1 |
interval2 |
interval3 |
Type |
value1 |
value2 |
value3 |
2 |
1 hour |
||||||
3 |
2 hours |
1 day |
7 week |
circle |
37 |
108.1 |
4.1 |
4 |
3 hours |
3 days |
7 weeks |
line |
39 |
117.5 |
4.2 |
5 |
4 hours |
4 days |
7 weeks |
line |
35 |
127 |
4.3 |
如果joinType==leftJoinSmall,则关联结果应该是:
A |
B |
C |
D |
E |
F |
G |
|
1 |
interval1 |
interval2 |
interval3 |
type |
value1 |
value2 |
value3 |
2 |
1 hour |
||||||
3 |
2 hours |
||||||
4 |
3 hours |
3 days |
7 weeks |
line |
39 |
117.5 |
4.2 |
5 |
4 hours |
4 days |
7 weeks |
line |
35 |
127 |
4.3 |
这个计算需要循环遍历sheet C,因此只能用脚本而不是公式来实现。Join属于结构化计算,VBA缺乏直接可用的函数,代码会非常繁琐。
实现步骤:
1. 运行集算器(可以到润乾官网下载,用职场版,首次运行时会提示加载授权,下载个免费的就够了)
2. 编写脚本并执行
A |
B |
|
1 |
=file("data.xlsx").xlsopen() |
|
2 |
=A1.xlsimport@t(;"C") |
|
3 |
=tableA=A1.xlsimport@t(;"A") |
|
4 |
for A2 |
=tableB=A1.xlsimport@t(;A4.table) |
5 |
=case(A4.joinType, |
|
6 |
=B5.new(A.interval1,B.interval2,B.interval3,B.type,B.value1,B.value2,B.value3) |
|
7 |
=file(A4.table+A4.joinType+".xlsx").xlsexport@t(B6) |
脚本函数 case 可对 Join 类型做分支判断,xjoin 算出笛卡尔积,@1 表示左关联。