分层k-means聚类和HCPC
分层k-means聚类
The procedure is as follow:
- Compute hierarchical clustering
- Cut the tree in k-clusters
- compute the center (i.e the mean) of each cluster
- Do k-means by using the set of cluster centers (defined in step 3) as the initial cluster centers. Optimize the clustering.
This means that the final optimized partitioning obtained at step 4 might be different from the initial partitioning obtained at step 2.
Consider mainly the result displayed by fviz_cluster().
library(factoextra)
library(FactoMineR)
df <- scale(USArrests)
# Compute hierarchical k-means clustering
res.hk <-hkmeans(df, 4)
hkmeans_tree(res.hk)
# Elements returned by hkmeans()
names(res.hk)
# Print the results
res.hk
# Visualize the tree
fviz_dend(res.hk,
cex = 0.6,
palette = "jco",
rect_border = "jco",
rect = TRUE,
rect_fill = TRUE)
# Visualize the hkmeans final clusters
fviz_cluster(res.hk,
palette = "jco",
ellipse = TRUE,
ellipse.type = "euclid",
# ellipse.type = "convex",#多边形
# ellipse.type = "confidence",#原形,置信区间
# ellipse.type = "t",#圆形,多元t分布
# ellipse.type = "norm",#圆形,多元z正态分布
star.plot = TRUE,
repel = TRUE,
ggtheme = theme_classic())
HCPC
The HCPC (Hierarchical Clustering on Principal Components) approach allows us to combine the three standard methods used in multivariate data analyses (Husson, Josse, and J. 2010):
1.Principal component methods (PCA, CA, MCA, FAMD, MFA),
2.Hierarchical clustering and
3.Partitioning clustering, particularly the k-means method.
FactoMineR软件包中实现的HCPC方法的算法可总结如下:
1.Compute principal component methods:PCA,(M)CA或MFA,具体取决于数据集中变量的类型和数据集的结构。在这一步,您可以通过指定参数ncp来选择要保留在输出中的维数(主要成分),预设值为5。
2.Compute hierarchical clustering:层次聚类是使用Ward准则对选定的主要组件执行的。Ward标准用于层次聚类中,因为它基于像主成分分析这样的多维方差。
3.Choose the number of clusters based on the hierarchical tree:通过切割层次树来执行初始分区。
4.Perform K-means clustering改善从分层聚类获得的初始分区。
使用k-均值合并后获得的最终分区解决方案可能与(通过层次划分)聚类中获得的解决方案略有不同。
library(factoextra)
library(FactoMineR)
# HCPC(USArrests, nb.clust = 0, min = 3, max = NULL, graph = TRUE)#鼠标点击位置
HCPC(USArrests, nb.clust = -1, min = 3, max = NULL, graph = F)#默认聚类为三组,效果差,PCA后效果不错
# Compute PCA with ncp = 3
res.pca <- PCA(USArrests, ncp = 3, graph =F)#仅保留前三个主要成分
# Compute hierarchical clustering on principal components
res.hcpc <- HCPC(res.pca,nb.clust = -1, graph = F)
#Visualize
fviz_dend(res.hcpc,
cex = 0.6, # Label size
palette = "jco", # Color palette see ?ggpubr::ggpar
rect = TRUE,
rect_fill = TRUE, # Add rectangle around groups
rect_border = "jco", # Rectangle color
labels_track_height = 0.3) # Augment the room for labels
fviz_cluster(res.hcpc,
repel = TRUE, # Avoid label overlapping
show.clust.cent = TRUE, # Show cluster centers
palette = "jco", # Color palette see ?ggpubr::ggpar
ggtheme = theme_minimal(),
main = "Factor map")
# Principal components + tree
plot(res.hcpc, choice = "3D.map")