生物信息学与算法科研信息学生物信息学从零开始学

下载并预处理TCGA数据

2019-05-19  本文已影响2人  落寞的橙子

本文为TCGA数据的下载,并整理为行名为基因名的数据结构

#数据下载的网站,下载下来并命名为HNSC_RSEM_genes_normalized.txt
#http://gdac.broadinstitute.org/runs/stddata__2016_01_28/data/HNSC/20160128/gdac.broadinstitute.org_HNSC.Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.Level_3.2016012800.0.0.tar.gz.md5
library(stringr)
hnsc<-read.table("your_dir/HNSC_RSEM_genes_normalized.txt",header = T,check.names = F,sep="\t")
hnsc<-hnsc[-1,]
row_name<-as.character(hnsc[,1])
row_name<-unlist(lapply(row_name, FUN = function(x) {return(strsplit(x, split = "|",fixed = T)[[1]][1])}))
hnsc[,1]<-row_name
hnsc<-hnsc[!duplicated(hnsc[,1]),]
row.names(hnsc)<-as.character(hnsc[,1])
hnsc<-hnsc[,-1]
col_names<-colnames(hnsc)
new_names<-unlist(lapply(col_names, FUN = function(x) {return(substr(x,1,16))}))
colnames(hnsc)<-new_names
write.csv(hnsc,"your_dir/hnsc_clean_data.csv")
上一篇下一篇

猜你喜欢

热点阅读