kmeans聚类算法存在的问题,神经网络聚类算法

基于一k均值算法的图像聚类

想法：各像素为13，对这13的像素进行聚类，用中心点的像素置换相同类别，结果聚集到60个类别中也没有太大差异

defrestore_image(centers，labels，shape ) : row，col，n=shapeimage=NP.empty ) ) row，col，n ) index=0ff j )=centers [ labels [ index ] (index=1returnimagedeftest _ picture _ cluster ) : mpl.rcparams [ ' font.sans-] image=NP.Array(im ).astype ) NP.float )/255#PLT.imshow ) image ) # plt.show ) # plt.close ) image=image 3 ) #将图像转换为NX3n=image_v.shape(0) #图像的像素clf=kmeans ) n _ clusters=nuuusters init=' k-means ' ) idx=NP.random 1000 ) #1000像素进行训练image _ sample=image _ v [ idx ] clf.fit (image _ sample ) image_predict=clf.predict image_predict，image.shape(fig=PLT.figure ) PLT.subplot (121 ) plt.show ) image (PLT.title ) original ' )

AP算法的魅力传播

以preference:为中心的参考程度

affinity:亲和度计算方法为心雨的负平方，在参考中值时也使用负中值

衰减系数：

AP算法为了选择合适的聚类中心，需要不断地从数据点收集这两方面的证据。候选聚类中心x(k )中任意一个数据点x(I )的吸引度信息r ) I，k )和数据点x ) I )的选择候选聚类中心x(k )的归属度信息a ) I，k )。

总的来说，AP算法是向中心凝聚的算法，可以不指定中心点的个数。缺点：计算复杂性很大

n=400 centers=[1，2 ]、[-1，-1]、[ 1，- 1，1 ]、[ data，y=ds.make_blobs(n，n _ features=2 squared=True ) )两者之间的距离perference=-NP.median(m ) )不指定轴的中值进行计算，中间值mpl.rcparams [ ' font.sans-serif ' ]=[ u=falsePLT.Flams facecolor='w ' ) fig=PLT.figure (fig size=(15，9 ) ) for index，mulin enumerate (NP.linspace ) 9 ) : p=perference * mul clf=affinity propagation (affinity=' euclidean '，preference=p ) y_predict=clf.fidean 0]，data [ :1 ]，c=y_predict ) PLT.title ) perferenceis{}

'.format(p)) plt.show() plt.close()

三 meanshift
均值漂移，个人理解为想着密度大的地方进行转移，直到方向不再发生巨大变化

fig2 = plt.figure(figsize=(15, 9)) for index, mul in enumerate(np.linspace(0.1, 0.4, 9)): band_width = -perference * mul clf = MeanShift(crdlh_seeding=True, bandwidth=band_width) y_predict = clf.fit_predict(data) print('中心点的个数', len(clf.cluster_centers_)) plt.subplot(3, 3, index + 1) plt.scatter(data[:, 0], data[:, 1], c=y_predict) plt.title('Meanshift perference is {}'.format(band_width)) plt.show() plt.close()

mean_shift

四层次聚类
伪代码
假设有N个待聚类的样本，对于层次聚类来说，步骤：
1.（初始化）把每个样本归为一类，计算每两个类之间的距离，也就是样本与样本之间的相似度；
2.寻找各个类之间最近的两个类，将他们归为一类（类总数减少一个）
3. 重新计算新生成的这个类与各个旧类之间的相似度；
4.重复2和3直到所有的样本点都归为一类，结束。

warnings.filterwarnings(action='ignore', category=UserWarning) np.set_printoptions(suppress=True) np.random.seed(0) n_clusters = 4 N = 400 data1, y1 = ds.make_blobs(n_samples=N, n_features=2, centers=((-1, 1), (1, 1), (1, -1), (-1, -1)), cluster_std=(0.1, 0.2, 0.3, 0.4), random_state=0) data1 = np.array(data1) n_noise = int(0.1 * N) r = np.random.rand(n_noise, 2) data_min1, data_min2 = np.min(data1, axis=0) data_max1, data_max2 = np.max(data1, axis=0) r[:, 0] = r[:, 0] * (data_max1 - data_min1) + data_min1 r[:, 1] = r[:, 1] * (data_max2 - data_min2) + data_min2 data1_noise = np.concatenate((data1, r), axis=0) y1_noise = np.concatenate((y1, [4] * n_noise)) data2, y2 = ds.make_moons(n_samples=N, noise=.05) data2 = np.array(data2) n_noise = int(0.1 * N) r = np.random.rand(n_noise, 2) data_min1, data_min2 = np.min(data2, axis=0) data_max1, data_max2 = np.max(data2, axis=0) r[:, 0] = r[:, 0] * (data_max1 - data_min1) + data_min1 r[:, 1] = r[:, 1] * (data_max2 - data_min2) + data_min2 data2_noise = np.concatenate((data2, r), axis=0) y2_noise = np.concatenate((y2, [3] * n_noise)) linkages = ["ward", "complete", "average", "single"] i = 1 fig = plt.figure(figsize=(15, 9)) for index, (n_cluster, data, y_label) in enumerate([(4, data1, y1), (4, data1_noise, y1_noise), (2, data2, y2), (2, data2_noise, y2_noise)]): plt.subplot(4, 2, 2*index+1) plt.scatter(data[:, 0], data[:, 1], c=y_label) plt.grid(b=True, ls=':') connectivity = kneighbors_graph(data, n_neighbors=7, mode='distance', metric='wmdgb', p=2, include_self=True) connectivity = 0.5*(connectivity + connectivity.T) model = AgglomerativeClustering(n_clusters=n_cluster, affinity='euclidean', connectivity=connectivity, linkage=linkages[i]) y_predict = model.fit_predict(data) plt.subplot(4, 2, 2*index+2) plt.grid(b=True, ls=':') plt.scatter(data[:, 0], data[:, 1], c=y_predict) plt.show() plt.close()

五谱聚类

谱聚类算法

matplotlib.rcParams['font.sans-serif'] = [u'SimHei'] matplotlib.rcParams['axes.unicode_minus'] = False t = np.arange(0, 2 * np.pi, 0.1) data1 = np.vstack((np.cos(t), np.sin(t))).T data2 = np.vstack((2 * np.cos(t), 2 * np.sin(t))).T data3 = np.vstack((3 * np.cos(t), 3 * np.sin(t))).T data = np.vstack((data1, data2, data3)) edu = euclidean_distances(data, squared=True) sigma = np.median(edu) n_clusters = 3 plt.figure(figsize=(12, 8), facecolor='w') plt.suptitle(u'谱聚类', fontsize=20) clrs = plt.cm.Spectral(np.linspace(0, 0.8, n_clusters)) # 给不同的类别不同的颜色 fig = plt.figure(figsize=(15, 9)) for index, s in enumerate(np.logspace(-2, 0, 4)): # 通过取不同的sigma，进行调参 af = np.exp(-edu**2 / (2*(s**2))) # 定义求权重的方法，其中data就是|xi-xj|^2, 再定义一下exp进行无私的画笔相似度计算 y_hat = spectral_clustering(af, n_clusters=n_clusters, assign_labels='kmeans', random_state=1) plt.subplot(4, 1, index + 1) plt.scatter(data[:, 0], data[:, 1], c=y_hat) plt.title('sigma is {}'.format(s)) plt.show() plt.close()