用TCGAbiolinks从TCGA数据下载到下游分析的学习笔记

2018-11-20 本文已影响3389人 547可是贼帅的547

前言

之前一直在用RTCGA包下载数据，看着永不更新的数据，心里总觉得怪怪的，于是下定决心重新学习一个好用的包——TCGAbiolinks。这个包调用GDC的API，应该是最新的数据。
主要参考：TCGAbiolinks: TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data

下载数据

直接上代码

# if (!requireNamespace("BiocManager", quietly=TRUE))
#   install.packages("BiocManager")
# BiocManager::install("TCGAbiolinks")
 

library(TCGAbiolinks)
library(dplyr)
library(DT)
library(SummarizedExperiment)


#下面填入要下载的癌症种类
request_cancer=c("PRAD","BLCA","KICH","KIRC","KIRP")
for (i in request_cancer) {
  cancer_type=paste("TCGA",i,sep="-")
  print(cancer_type)
  #下载临床数据
  clinical <- GDCquery_clinic(project = cancer_type, type = "clinical")
  write.csv(clinical,file = paste(cancer_type,"clinical.csv",sep = "-"))
  
  #下载rna-seq的counts数据
  query <- GDCquery(project = cancer_type, 
                    data.category = "Transcriptome Profiling", 
                    data.type = "Gene Expression Quantification", 
                    workflow.type = "HTSeq - Counts")
  
  GDCdownload(query, method = "api", files.per.chunk = 100)
  expdat <- GDCprepare(query = query)
  count_matrix=assay(expdat)
  write.csv(count_matrix,file = paste(cancer_type,"Counts.csv",sep = "-"))
  
  #下载miRNA数据
  query <- GDCquery(project = cancer_type, 
                    data.category = "Transcriptome Profiling", 
                    data.type = "miRNA Expression Quantification", 
                    workflow.type = "BCGSC miRNA Profiling")
  
  GDCdownload(query, method = "api", files.per.chunk = 50)
  expdat <- GDCprepare(query = query)
  count_matrix=assay(expdat)
  write.csv(count_matrix,file = paste(cancer_type,"miRNA.csv",sep = "-"))
  
  #下载Copy Number Variation数据
  query <- GDCquery(project = cancer_type, 
                    data.category = "Copy Number Variation", 
                    data.type = "Copy Number Segment")
  
  GDCdownload(query, method = "api", files.per.chunk = 50)
  expdat <- GDCprepare(query = query)
  count_matrix=assay(expdat)
  write.csv(count_matrix,file = paste(cancer_type,"Copy-Number-Variation.csv",sep = "-"))
  
  #下载甲基化数据
  query.met <- GDCquery(project =cancer_type,
                        legacy = TRUE,
                        data.category = "DNA methylation")
  GDCdownload(query.met, method = "api", files.per.chunk = 300)
  expdat <- GDCprepare(query = query)
  count_matrix=assay(expdat)
  write.csv(count_matrix,file = paste(cancer_type,"methylation.csv",sep = "-"))
}

常用的一些数据基本都下下来了，放在当前目录下。

用TCGAbiolinks从TCGA数据下载到下游分析的学习笔记

前言

下载数据

直接上代码

猜你喜欢

热点阅读