7.4 层次 kmeans 聚类
有工业经验表明,重要度高的出料划分生产路线时所起的作用更大,为了体现这一作用,我们尝试了层次kmeans聚类,即按出料的重要度分层次进行kmeans聚类,过程如下:
1. 将所有出料数据X在初始化类心后用kmeans聚成2类,记录下类心。
2. 第1类数据不再参与聚类,对第2类数据用同样的方法聚成2类,并记录类心。
3. 按第2步的方法循环下去,直到类别数达到k或者只剩一个成员不能再聚类。
4. 预测时,每个预测数据Yi先和第1层类心计算距离,属于第1类就记录下来,属于第2类再和下一层的类心计算距离,确定是属于这一层的类别还是继续和下一层类心计算距离,如此循环,直到确认了Yi的类别为止。
SPL例程
A | B | C | ||
1 | [[0.113,0.345,0.316], [0.118,0.314,0.322], [0.125,0.334,0.314], [0.139,0.254,0.371], [0.111,0.361,0.306], [0.179,0.257,0.332]] |
/X | ||
2 | [[0.116,0.371,0.307], [0.143,0.324,0.303]] |
/Y | ||
3 | =mk=3 | |||
4 | =center_seq=[] | |||
5 | =xc=A1.(0) | |||
6 | =idx_seq=to(A1.len()) | |||
7 | =col_seq=to(A1.~.len()) | |||
8 | =mstd@s(A1,1).~ | |||
9 | for | =A1(idx_seq) | /剩下的数据 | |
10 | =transpose(B9)(col_seq) | |||
11 | =A8(col_seq).psort@z() | /Sidx | ||
12 | =B10(B11) | |||
13 | =B12.(~.ranks()) | / RK的转置 | ||
14 | =as=to(B13.len()),av_idx=to(B9.len()),B13.((oidx=as\#,pma=(~--msum(B13(oidx),1).~)(av_idx).pmax(),res=av_idx(pma),av_idx.delete(pma),res)) | /Cb索引 | ||
15 | =B9(B14.select(~)) | /Cb’ | ||
16 | =B15.to(2) | /初始化类心C | ||
17 | =k_means(B9,2,300,B16) | |||
18 | =B17(1) | |||
19 | =B17(2) | |||
20 | =B19.run(~=~+A9-1) | |||
21 | =B20.group@p(~) | |||
22 | =B21(1) | |||
23 | =B21(2) | |||
24 | =idx_seq(B22) | |||
25 | =idx_seq(B23) | |||
26 | =xc(B24)=B20(B22) | |||
27 | =idx_seq=B25 | |||
28 | =col_seq(B11.~) | |||
29 | =col_seq=col_seq\B28 | |||
30 | =center_seq.insert(0,[B18]) | |||
31 | if idx_seq.len()==1||A9==mk-1 | =xc(B25)=B20(B23) | ||
32 | break | |||
33 | =center_seq | /类心集合 | ||
34 | =yc=[] | |||
35 | =A33.len() | |||
36 | for A2 | for A33 | =B36.pmin(dis(~,A36)) | |
37 | =#B36 | |||
38 | =C36+C37-1 | |||
39 | if C36==1||C37==A35 | =A34.insert(0,C38) | ||
40 | next A36 | |||
41 | return [center_seq,xc,yc] |
计算结果示例:
收率数据X:
聚类数k=3。
预测数据Y:
第一层聚类类心C1:
第二层聚类类心C2:
X各成员所属类别Xc:
Y所属类别Yc:
例程中的数据比较少,第2层聚类后,用来聚类的数据就只剩1个,无法继续分层了,所以即使把类别数k改成4甚至更大,聚类的结果还是只有3类。这是层次kmeans算法决定的,所以当需要聚成确定数量的离别时,还是要使用之前介绍的初始化类心的kmeans算法。