生信科研生度好文GEO数据挖掘

一个不咋成功的3d

2019-07-28  本文已影响71人  Juan_NF

从芯片文件读取,RMA处理后,主要绘制两个图,pca3d和热图。

文章来源:

《Integrated Analysis of Copy Number Variation and Genome-Wide Expression Profiling in Colorectal Cancer Tissues》

图片:
image.png
CEL数据读取
  • 本应该是很简单的事情,但我耽误了一天的时间用来装包。。。
  • 第一次进行芯片数据的处理,纠结了在oligo和affy中究竟选择哪个包;最终基于文章中的probe_id和读取后结果中的id的%in%结果,选择了oligo
  • 本希望直接读取chp文件,无奈没找到相应的解决方案;
  • 步骤为:
    下载数据(源于ArrayExpress,使用相应的R包下载即可)
    获取expression data和phenotype data
    RMA对expression data处理
#########################数据下载#######################
library(ArrayExpress)
raw_data_dir <- tempdir()
if (!dir.exists(raw_data_dir)) {
  dir.create(raw_data_dir)
}
anno_AE <- getAE("E-MEXP-3980", path = raw_data_dir, type = "raw")
sdrf_location <- file.path(raw_data_dir, "E-MEXP-3980.sdrf.txt")
SDRF <- read.delim(sdrf_location)
rownames(SDRF) <- SDRF$Array.Data.File
SDRF <- AnnotatedDataFrame(SDRF)
raw_data <- oligo::read.celfiles(filenames = file.path(raw_data_dir, 
                                                       SDRF$Array.Data.File),
                                 verbose = FALSE, phenoData = SDRF)
stopifnot(validObject(raw_data))
######################exprssion和phenodata#########################
pd<- Biobase::pData(raw_data)
exp_raw<- oligo::rma(raw_data, target = "core")
exp_rma<- Biobase::exprs(exp_raw)
save(raw_data,file='raw.Rdata')
save(pd,file='pd.Rdata')
save(exp_rma,file='rma.Rdata')
PCA|heatmap
load('rma.Rdata')
load('pd.Rdata')
colnames(exp_rma)<- gsub('_.+','',rownames(matr))
boxplot(exp_rma,outliers=F,las=2)
library(FactoMineR)
PCA_raw <- PCA(t(exp_rma),graph = F)
####热图
##################choose gene#################
library(openxlsx)
a<- read.xlsx('./cel/pone.0092553.s002.xlsx',sheet=1,startRow=3)
a<- a[,c(1,15,17)]
colnames(a) <- c('probe_id','p_value','fold_change')
exp_rma<- exp_rma[rownames(exp_rma)%in%a$probe_id,]
matr<- scale(t(exp_rma))
matr[matr< -2] <- -2
matr[matr>2] <- 2
library(gplots)
####
rownames(matr)<- gsub('_.+','',rownames(matr))
pd$group<- gsub('_.+','',pd$Array.Data.File)
group<- as.character(pd$Characteristics.disease.[match(rownames(matr),pd$group)])
gr <-ifelse(group=='Normal','red','blue')
####
library('scatterplot3d')
png(file='3d.png')
scatterplot3d(PCA_raw$ind$coord[,c(3,1,2)],
              main=paste0('PCA mapping','(',round(sum(PCA_raw$eig[,2][1:3]),2),')'),
              pch = 20, 
              color=gr,
              cex.symbols = 1.5,
              xlab = 'PC3',
              ylab = 'PC1',
              zlab = 'PC2')
par(lend = 1)           # square line ends for the color legend
legend(-2.5,6.5,
       title = 'sample',
       legend=c('Normal', 'Tumor'),
       col=c('red','blue'),
       pch=15)
dev.off()
####
png(file="myplot.png")
heatmap.2(matr,
          lmat=rbind( c(0,0,4), c(3,1,2),c(0,5,5) ), lhei=c(1,4,1.2),
          lwid=c(1.5,0.2,4.0),
          key = TRUE,
          key.xlab = '',
          key.ylab = '',
          keysize = 1.5,
          RowSideColors = gr,
          Colv=T,
          dendrogram = 'both',
          key.title = '',
          trace = 'none',
          scale = 'none', 
          srtCol = 30, offsetCol = -0.5,
          col=redgreen,
          labRow = '',
          labCol = '') 
par(lend = 1)           # square line ends for the color legend
legend(0.3,0.1,
       legend=c('Normal', 'Tumor'),
       col=c('red','blue'),
       horiz=T,
       pch=15)
dev.off()
box
heatmap.2
PCA3d
[参考资料]
1.An end to end workflow for differential gene expression using Affymetrix microarrays
2.Amazing interactive 3D scatter plots - R software and data visualization
上一篇下一篇

猜你喜欢

热点阅读