target数据库初探

2020-09-21 本文已影响0人刘伟健_5997

在简书上看了很多文章，有很多收获，但自己从来没写过。

生信有很久没碰了，因为需要，又一次捡了起来。

之前在TCGA数据库一直没有找到骨肉瘤的临床数据，不能把基因表达和临床预后结合起来分析。

今天偶然看到了TARGET数据库，发现有骨肉瘤的数据，下载下来整理。

给自己留一个记录，方便复习。

点击TARGET Data Matrix，获取数据

setwd("E:/scRNA seq/osteosarcoma/target/gene copy")

files <- list.files()

for (f in files){

newname<-sub('-01A-01R.gene.quantification','',f) #将原文件中的字符""，替换为字符""

file.rename(f,newname)

} # 批量更改文件名，方便后面批量读取和匹配

clinical <- read.csv(file="./TARGET OS.csv",

head = T)

colnames(clinical)[1] <- "patients"

write.csv(clinical, file = "clinical.csv", row.names = F)

patients <- clinical$patients

ls <- list.files()[-1]

ls <- sub(".txt","",ls)

head(ls)

included <- clinical[match(ls, clinical$patients),]

aa <- na.omit(included$patients)

head(aa)

included <- included[match(aa,included$patients),]

match(aa,ls)

先把数据down下来，把gene表达信息和临床数据的患者编号匹配上

setwd("E:/scRNA seq/osteosarcoma/target/gene copy")

filenames <- list.files()

for (file in filenames) {

if (!exists("all.data")) {

all.data <- read.table(file, sep = "\t",

header = TRUE)[,4]

}

if (exists("all.data")) {

new.data <- read.table(file, sep = "\t",

header = TRUE)[,4]

all.data <- cbind(all.data, new.data)

}

}

dim(all.data)

counts <- all.data[,-1]

dim(counts)

ls <- sub(".txt","",filenames)

colnames(counts) <- ls

rownames(counts) <- genes

然后循环批量读取表达量数据，这次选取读TPM数。

下次再进行分析。

target数据库初探

猜你喜欢

热点阅读