常用模型使用简介

2019-10-18  本文已影响0人  Bio_Learner

XGBoost

xgboost入门与实战(原理篇)的后半部分介绍了需要注意的参数和基本使用方法。
神器xgboost简单入门和运用
手把手教写出XGBoost实战程序

NMF

NMF(非负矩阵分解)
NMF中的cophenetic correlation coefficient可以用来判断最佳聚类数,源于文献的一句话:We select values of k where the magnitude of the cophenetic correlation coefficient begins to fall
示例:
R语言 NMF 如何自动判断最佳rank的数目
Cophenetic correlation: Wikis

Consensus Clustering

Consensus Clustering
上述链接的讲解不错,另外关于如何确定最佳K值,这个链接有讨论:
how to choose optimal K in Consensus clustering
其中,有人提到最常用的方法是:

PAC has been shown to outperform other K-estimating methods (e.g., ) in this paper and this paper
Dr. Yasin Şenbabaoğlu has kindly provided the R implementation of PAC . You can use the results in ConsensusClusterPlus as input to get optimal K based on minimum PAC. The code is from here.

######################################################## 
seed=11111
d = matrix(rnorm(200000,0,1),ncol=200) # 200 samples in columns, 1000 genes in rows
colnames(d) = paste("Samp",1:200,sep="")
rownames(d) = paste("Gene",1:1000,sep="")
d = sweep(d,1, apply(d,1,median,na.rm=T))
maxK = 6 # maximum number of clusters to try
results = ConsensusClusterPlus(d,maxK=maxK,reps=50,pItem=0.8,pFeature=1,title="test_run",
innerLinkage="complete",seed=seed,plot="pdf")

# Note that we implement consensus clustering with innerLinkage="complete". 
# We advise against using innerLinkage="average" which is the default value in this package as average linkage is not robust to outliers.

############## PAC implementation ##############
Kvec = 2:maxK
x1 = 0.1; x2 = 0.9 # threshold defining the intermediate sub-interval
PAC = rep(NA,length(Kvec)) 
names(PAC) = paste("K=",Kvec,sep="") # from 2 to maxK
for(i in Kvec){
  M = results[[i]]$consensusMatrix
  Fn = ecdf(M[lower.tri(M)])
  PAC[i-1] = Fn(x2) - Fn(x1)
}#end for i
# The optimal K
optK = Kvec[which.min(PAC)]
########################################################

其他介绍:
R中实现鉴定簇集数及其成员的算法
R语言 ConsensusClusterPlus 确定最佳K值
一致性聚类ConsensusClusterPlus

上一篇下一篇

猜你喜欢

热点阅读