继续学习TCGA-使用R包TCGA-biolinks下载数据
2020-01-06 本文已影响0人
程凉皮儿
参考学习资料:还是小洁老师的TCGA-3.R包TCGA-biolinks下载数据
下载安装包
> BiocManager::install("TCGAbiolinks")
Bioconductor version 3.10 (BiocManager 1.30.9), R 3.6.1 (2019-07-05)
Installing package(s) 'TCGAbiolinks'
trying URL 'https://mirrors.ustc.edu.cn/bioc/packages/3.10/bioc/bin/macosx/el-capitan/contrib/3.6/TCGAbiolinks_2.14.0.tgz'
Content type 'application/octet-stream' length 61624559 bytes (58.8 MB)
==================================================
downloaded 58.8 MB
这个包有点大,安装需要点时间
其实从知道这个包名,应该就知道怎么学习了,不用赘述
#BiocManager::install("TCGAbiolinks")
library(TCGAbiolinks)
browseVignettes("TCGAbiolinks")
学习资料如下:
Vignettes in package TCGAbiolinks
- "1. Introduction" - HTML source R code
- "10. TCGAbiolinks_Extension" - HTML source R code
- "2. Searching GDC database" - HTML source R code
- "3. Downloading and preparing files for analysis" - HTML source R code
- "4. Clinical data" - HTML source R code
- "5. Mutation data" - HTML source R code
- "9. Graphical User Interface (GUI)" - HTML source R code
- 6. Compilation of TCGA molecular subtypes - HTML source R code
- 7. Analyzing and visualizing TCGA data - HTML source R code
- 8. Case Studies - HTML source R code
按照原来的例子来下载一遍数据看看情况:
#可以查看所有支持的癌症种类的缩写
TCGAbiolinks:::getGDCprojects()$project_id
#还是选择之前的例子
cancer_type="TCGA-MESO"
下载临床数据
> clinical <- GDCquery_clinic(project = cancer_type, type = "clinical")
> clinical[1:4,1:4]
submitter_id year_of_diagnosis classification_of_tumor last_known_disease_status
1 TCGA-3H-AB3K 2008 not reported not reported
2 TCGA-3H-AB3L 2008 not reported not reported
3 TCGA-3H-AB3M 2008 not reported not reported
4 TCGA-3H-AB3O 2008 not reported not reported
> dim(clinical)
[1] 87 73
一个函数解决问题下载的临床信息合并好的表格
可以看到,更新时间是2019年8月,时效性也不错。
下载miRNA数据
query <- GDCquery(project = cancer_type,
data.category = "Transcriptome Profiling",
data.type = "miRNA Expression Quantification",
workflow.type = "BCGSC miRNA Profiling")
GDCdownload(query, method = "api", files.per.chunk = 50)
expdat <- GDCprepare(query = query)
同样是一个函数搞定下载数据,需要进一步转换类型。
#install.packages("tibble")
library(tibble)
rownames(expdat) <- NULL
expdat <- column_to_rownames(expdat,var = "miRNA_ID")
exp = t(expdat[,seq(1,ncol(expdat),3)])
exp[1:4,1:4]
所以从这个包出发比较方便,大大节省了前期处理的时间。
也可以去B站看视频学习
课程链接:https://www.bilibili.com/video/av49363776