数据库R语言学习

【数据库>>CGGA】CGGA数据库

2021-04-28  本文已影响0人  高大石头

CGGA:Chinese Glioma Genome Atlas,中国脑胶质瘤基因组图谱计划,发布2000例中国脑胶质瘤样本的功能基因组学数据。包含详细数据信息:

rm(list = ls())
library(tidyverse)
library(limma)
library(sva)
library(preprocessCore)
setwd("E:/glioma")

# 整理693例病人测序结果
rt1 <- data.table::fread("CGGA.mRNAseq_693.RSEM-genes.20200506.txt",data.table = F) %>% 
  column_to_rownames("Gene_Name")
data1 <- log2(rt1+1)
data1a <- normalize.quantiles(as.matrix(data1))
rownames(data1a) <- rownames(data1)
colnames(data1a) <- colnames(data1a)

# 整理325例病人测序结果
rt2 <- data.table::fread("CGGA.mRNAseq_325.RSEM-genes.20200506.txt",data.table = F)%>% 
  column_to_rownames("Gene_Name")
data2 <- log2(rt2+1)
data2a <-  normalize.quantiles(as.matrix(data2))
rownames(data2a) <- rownames(data2)
colnames(data2a) <- colnames(data2)
#取交集
samegene <- intersect(row.names(data1),row.names(data2))
data <- cbind(data1a[samegene,],data2a[samegene,])

#去除批次
batchtype <- c(rep(1,ncol(data1a)),rep(2,ncol(data2a)))
outTab <- ComBat(data,batchtype,par.prior = T)
outTab1 <- outTab %>% 
  as.data.frame() %>% 
  rownames_to_column("id")
write.table(outTab,file = "CGGA-normalize.txt",sep="\t",quote = F,col.names = F)

参考链接:
Chinese Glioma Genome Atlas (CGGA): A Comprehensive Resource with Functional Genomic Data for Chinese Glioma Patients

上一篇 下一篇

猜你喜欢

热点阅读