R语言杂记

分层k-means聚类和HCPC

2020-06-02  本文已影响0人  leoxiaobei

分层k-means聚类

The procedure is as follow:

  1. Compute hierarchical clustering
  2. Cut the tree in k-clusters
  3. compute the center (i.e the mean) of each cluster
  4. Do k-means by using the set of cluster centers (defined in step 3) as the initial cluster centers. Optimize the clustering.
    This means that the final optimized partitioning obtained at step 4 might be different from the initial partitioning obtained at step 2.
    Consider mainly the result displayed by fviz_cluster().
library(factoextra)
library(FactoMineR)

df <- scale(USArrests)
# Compute hierarchical k-means clustering
res.hk <-hkmeans(df, 4)
hkmeans_tree(res.hk)

# Elements returned by hkmeans()
names(res.hk)
# Print the results
res.hk

# Visualize the tree
fviz_dend(res.hk, 
          cex = 0.6, 
          palette = "jco", 
          rect_border = "jco", 
          rect = TRUE, 
          rect_fill = TRUE)
# Visualize the hkmeans final clusters
fviz_cluster(res.hk, 
             palette = "jco", 
             ellipse = TRUE,
             ellipse.type = "euclid",
             # ellipse.type = "convex",#多边形
             # ellipse.type = "confidence",#原形,置信区间
             # ellipse.type = "t",#圆形,多元t分布
             # ellipse.type = "norm",#圆形,多元z正态分布
             star.plot = TRUE,
             repel = TRUE,
             ggtheme = theme_classic())

HCPC

The HCPC (Hierarchical Clustering on Principal Components) approach allows us to combine the three standard methods used in multivariate data analyses (Husson, Josse, and J. 2010):
1.Principal component methods (PCA, CA, MCA, FAMD, MFA),
2.Hierarchical clustering and
3.Partitioning clustering, particularly the k-means method.

FactoMineR软件包中实现的HCPC方法的算法可总结如下:
1.Compute principal component methods:PCA,(M)CA或MFA,具体取决于数据集中变量的类型和数据集的结构。在这一步,您可以通过指定参数ncp来选择要保留在输出中的维数(主要成分),预设值为5。
2.Compute hierarchical clustering:层次聚类是使用Ward准则对选定的主要组件执行的。Ward标准用于层次聚类中,因为它基于像主成分分析这样的多维方差。
3.Choose the number of clusters based on the hierarchical tree:通过切割层次树来执行初始分区。
4.Perform K-means clustering改善从分层聚类获得的初始分区。
使用k-均值合并后获得的最终分区解决方案可能与(通过层次划分)聚类中获得的解决方案略有不同。

library(factoextra)
library(FactoMineR)
# HCPC(USArrests, nb.clust = 0, min = 3, max = NULL, graph = TRUE)#鼠标点击位置
HCPC(USArrests, nb.clust = -1, min = 3, max = NULL, graph = F)#默认聚类为三组,效果差,PCA后效果不错

# Compute PCA with ncp = 3
res.pca <- PCA(USArrests, ncp = 3, graph =F)#仅保留前三个主要成分
# Compute hierarchical clustering on principal components
res.hcpc <- HCPC(res.pca,nb.clust = -1, graph = F)

#Visualize
fviz_dend(res.hcpc, 
          cex = 0.6,                     # Label size
          palette = "jco",               # Color palette see ?ggpubr::ggpar
          rect = TRUE, 
          rect_fill = TRUE, # Add rectangle around groups
          rect_border = "jco",           # Rectangle color
          labels_track_height = 0.3)      # Augment the room for labels
fviz_cluster(res.hcpc,
             repel = TRUE,            # Avoid label overlapping
             show.clust.cent = TRUE, # Show cluster centers
             palette = "jco",         # Color palette see ?ggpubr::ggpar
             ggtheme = theme_minimal(),
             main = "Factor map")
# Principal components + tree
plot(res.hcpc, choice = "3D.map")
上一篇 下一篇

猜你喜欢

热点阅读