PCA 结果用于层次聚类
2020-06-06 本文已影响0人
caokai001
参考:
Clustering with selected Principal Components
R语言:prcomp做主成分分析(PCA)
https://github.com/vallotlab/scChIPseq/blob/master/R_scChIP_seq_analysis.R
每R一点:层次聚类分析实例实战-dist、hclust、heatmap等
缘由:
降维后进行层次聚类,通常会选择用dist 计算降维后样本距离,这篇文章用的1-cor()
来代替距离计算,记录一下。
1.模拟数据
set.seed(1995)
# 随机种子
data=matrix(abs(round(rnorm(100, mean=1000, sd=500))), 10, 10)
# 随机正整数,20行,20列
colnames(data)=paste("变量", 1:10, sep=".")
# 列名-细菌
rownames(data)=paste("样品", 1:10, sep=".")
1591455758418.png
2.标准化
R函数:scale(data, center=T/F, scale=T/F)
center (中心化):将数据减去均值
scale (标准化):在中心化后的数据基础上再除以数据的标准差
# scale函数进行数据标准化
data2=scale(data) ##默认参数: center=T, scale=T
# plot 函数只能可视化两维
plot(data2, main="scaled data")
1591455736333.png
3.PCA
data2.pca <- stats::prcomp(data2, center=F, scale=F)
# PCA分析
data2.pca
# 查看PCA结果
plot(data2.pca$x)
1591455709905.png
4.层次聚类 : 对PCA降维后坐标进行聚类分析
基于dist 函数计算距离
x = data2.pca$x[,1]
y = data2.pca$x[,2]
z = data2.pca$x[,3]
#*****************************************************************
# Create clusters
#******************************************************************
# create and plot clusters based on the first and second principal components
hc = hclust(dist(cbind(x,y)), method = 'ward.D2')
plot(hc, axes=F,xlab='', ylab='',sub ='', main='Comp 1/2')
rect.hclust(hc, k=3, border='red')
1591455946290.png
# create and plot clusters based on the first, second, and third principal components
hc = hclust(dist(cbind(x,y,z)), method = 'ward.D2')
plot(hc, axes=F,xlab='', ylab='',sub ='', main='Comp 1/2/3')
rect.hclust(hc, k=3, border='red')
1591455970666.png
基于相关系数代替距离
# create and plot clusters based on the correlation among companies
mati <- as.matrix(t(data2.pca$x[,1:10]))
hc = hclust(as.dist(1-cor(mati)), method = 'ward.D2')
plot(hc, axes=F,xlab='', ylab='',sub ='', main='Correlation')
rect.hclust(hc, k=3, border='red')
1591456478881.png
思考:
不太确定哪一种比较好,欢迎交流讨论~😂