本文实例讲述了Python聚类算法之基本K均值运算技巧。分享给大家供大家参考,具体如下:
基本K均值 :选择 K 个初始质心,其中 K 是用户指定的参数,即所期望的簇的个数。每次循环中,每个点被指派到最近的质心,指派到同一个质心的点集构成一个。然后,根据指派到簇的点,更新每个簇的质心。重复指派和更新操作,直到质心不发生明显的变化。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
|
# scoding=utf-8 import pylab as pl points = [[ int (eachpoint.split( "#" )[ 0 ]), int (eachpoint.split( "#" )[ 1 ])] for eachpoint in open ( "points" , "r" )] # 指定三个初始质心 currentCenter1 = [ 20 , 190 ]; currentCenter2 = [ 120 , 90 ]; currentCenter3 = [ 170 , 140 ] pl.plot([currentCenter1[ 0 ]], [currentCenter1[ 1 ]], 'ok' ) pl.plot([currentCenter2[ 0 ]], [currentCenter2[ 1 ]], 'ok' ) pl.plot([currentCenter3[ 0 ]], [currentCenter3[ 1 ]], 'ok' ) # 记录每次迭代后每个簇的质心的更新轨迹 center1 = [currentCenter1]; center2 = [currentCenter2]; center3 = [currentCenter3] # 三个簇 group1 = []; group2 = []; group3 = [] for runtime in range ( 50 ): group1 = []; group2 = []; group3 = [] for eachpoint in points: # 计算每个点到三个质心的距离 distance1 = pow ( abs (eachpoint[ 0 ] - currentCenter1[ 0 ]), 2 ) + pow ( abs (eachpoint[ 1 ] - currentCenter1[ 1 ]), 2 ) distance2 = pow ( abs (eachpoint[ 0 ] - currentCenter2[ 0 ]), 2 ) + pow ( abs (eachpoint[ 1 ] - currentCenter2[ 1 ]), 2 ) distance3 = pow ( abs (eachpoint[ 0 ] - currentCenter3[ 0 ]), 2 ) + pow ( abs (eachpoint[ 1 ] - currentCenter3[ 1 ]), 2 ) # 将该点指派到离它最近的质心所在的簇 mindis = min (distance1,distance2,distance3) if (mindis = = distance1): group1.append(eachpoint) elif (mindis = = distance2): group2.append(eachpoint) else : group3.append(eachpoint) # 指派完所有的点后,更新每个簇的质心 currentCenter1 = [ sum ([eachpoint[ 0 ] for eachpoint in group1]) / len (group1), sum ([eachpoint[ 1 ] for eachpoint in group1]) / len (group1)] currentCenter2 = [ sum ([eachpoint[ 0 ] for eachpoint in group2]) / len (group2), sum ([eachpoint[ 1 ] for eachpoint in group2]) / len (group2)] currentCenter3 = [ sum ([eachpoint[ 0 ] for eachpoint in group3]) / len (group3), sum ([eachpoint[ 1 ] for eachpoint in group3]) / len (group3)] # 记录该次对质心的更新 center1.append(currentCenter1) center2.append(currentCenter2) center3.append(currentCenter3) # 打印所有的点,用颜色标识该点所属的簇 pl.plot([eachpoint[ 0 ] for eachpoint in group1], [eachpoint[ 1 ] for eachpoint in group1], 'or' ) pl.plot([eachpoint[ 0 ] for eachpoint in group2], [eachpoint[ 1 ] for eachpoint in group2], 'oy' ) pl.plot([eachpoint[ 0 ] for eachpoint in group3], [eachpoint[ 1 ] for eachpoint in group3], 'og' ) # 打印每个簇的质心的更新轨迹 for center in [center1,center2,center3]: pl.plot([eachcenter[ 0 ] for eachcenter in center], [eachcenter[ 1 ] for eachcenter in center], 'k' ) pl.show() |
运行效果截图如下:
希望本文所述对大家Python程序设计有所帮助。