Analyzing RNA-seq data with DESe

2020-10-16 本文已影响0人 BINBINCC

前面铺垫了那么多终于要开始了

Analyzing RNA-seq data with DESeq2（一）
Analyzing RNA-seq data with DESeq2（二）
Analyzing RNA-seq data with DESeq2（三）
Analyzing RNA-seq data with DESeq2（四）
Analyzing RNA-seq data with DESeq2（五）

Differential expression analysis

The standard differential expression analysis steps are wrapped into a single function, DESeq.
Results tables are generated using the function results, which extracts a results table with log2 fold changes, p values and adjusted p values.

dds <- DESeq(dds)
res <- results(dds)
res

## log2 fold change (MLE): condition treated vs untreated 
## Wald test p-value: condition treated vs untreated 
## DataFrame with 9921 rows and 6 columns
##               baseMean log2FoldChange     lfcSE       stat    pvalue      padj
##              <numeric>      <numeric> <numeric>  <numeric> <numeric> <numeric>
## FBgn0000008   95.14429     0.00227644  0.223729   0.010175 0.9918817  0.997211
## FBgn0000014    1.05652    -0.49512039  2.143186  -0.231021 0.8172987        NA
## FBgn0000017 4352.55357    -0.23991894  0.126337  -1.899041 0.0575591  0.288002
## FBgn0000018  418.61048    -0.10467391  0.148489  -0.704927 0.4808558  0.826834
## FBgn0000024    6.40620     0.21084779  0.689588   0.305759 0.7597879  0.943501
## ...                ...            ...       ...        ...       ...       ...
## FBgn0261570 3208.38861      0.2955329  0.127350  2.3206264  0.020307  0.144240
## FBgn0261572    6.19719     -0.9588230  0.775315 -1.2366888  0.216203  0.607848
## FBgn0261573 2240.97951      0.0127194  0.113300  0.1122634  0.910615  0.982657
## FBgn0261574 4857.68037      0.0153924  0.192567  0.0799327  0.936291  0.988179
## FBgn0261575   10.68252      0.1635705  0.930911  0.1757102  0.860522  0.967928

Note that we could have specified the coefficient or contrast we want to build a results table for, using either of the following equivalent commands:
如果需要明确如何处理对象则可以使用以下方法:

res <- results(dds, name="condition_treated_vs_untreated")
##The text, condition treated vs untreated, tells you that the estimates are of the logarithmic fold change log2(treated/untreated).
通过对treated和untreated顺序的更改来达到明确处理对象的问题
res2 <- results(dds, contrast=c("condition","untreated","treated"))
res2

log2 fold change (MLE): condition untreated vs treated 
Wald test p-value: condition untreated vs treated 
DataFrame with 9921 rows and 6 columns
                    baseMean       log2FoldChange             lfcSE                 stat             pvalue              padj
                   <numeric>            <numeric>         <numeric>            <numeric>          <numeric>         <numeric>
FBgn0000008 95.1440789963134 -0.00215437939131803 0.223778503681452 -0.00962728481903151  0.992318656737681 0.997034169677036
FBgn0000014 1.05657219346166    0.495299273223956  2.14327479881991    0.231094619083222  0.817241295341332                NA
FBgn0000017  4352.5928987935    0.240025555383081 0.126342087768198     1.89980678349608 0.0574584804033397 0.287669966702742
FBgn0000018 418.614930505122    0.104799219010569 0.148536767352526    0.705543959778297  0.480471785048082 0.826695551321695
FBgn0000024 6.40628919476961   -0.210746819818719 0.689970836726196   -0.305443083389843  0.760028712717949 0.943790945818601
...                      ...                  ...               ...                  ...                ...               ...
FBgn0261570 3208.38445969106   -0.295433108560876 0.127325952346867    -2.32028980043316 0.0203252055046957 0.144360633334623
FBgn0261572 6.19713652050888     0.95895731601274 0.775588232046352     1.23642582028685  0.216300323350011 0.608242035446242
FBgn0261573 2240.98398636611  -0.0126194294797913 0.113296402610125   -0.111384202755468  0.911311686482631 0.982594010602204
FBgn0261574 4857.74267170989  -0.0152592268752615 0.192504546880859  -0.0792668387448812  0.936820382128824 0.988423794214432
FBgn0261575 10.6835537573787   -0.163219202829332 0.930591813482421   -0.175392906389902  0.860770914073928 0.967954942188896

Log fold change shrinkage for visualization and ranking

对数据进行收缩处理

resultsNames(dds)
## [1] "Intercept"                      "condition_treated_vs_untreated"

resLFC <- lfcShrink(dds, coef="condition_treated_vs_untreated", type="apeglm")
resLFC

## log2 fold change (MAP): condition treated vs untreated 
## Wald test p-value: condition treated vs untreated 
## DataFrame with 9921 rows and 5 columns
##               baseMean log2FoldChange     lfcSE    pvalue      padj
##              <numeric>      <numeric> <numeric> <numeric> <numeric>
## FBgn0000008   95.14429     0.00119920  0.151897 0.9918817  0.997211
## FBgn0000014    1.05652    -0.00473412  0.205468 0.8172987        NA
## FBgn0000017 4352.55357    -0.18989990  0.120377 0.0575591  0.288002
## FBgn0000018  418.61048    -0.06995753  0.123901 0.4808558  0.826834
## FBgn0000024    6.40620     0.01752715  0.198633 0.7597879  0.943501
## ...                ...            ...       ...       ...       ...
## FBgn0261570 3208.38861     0.24110290 0.1244469  0.020307  0.144240
## FBgn0261572    6.19719    -0.06576173 0.2141351  0.216203  0.607848
## FBgn0261573 2240.97951     0.01000619 0.0993764  0.910615  0.982657
## FBgn0261574 4857.68037     0.00843552 0.1408267  0.936291  0.988179
## FBgn0261575   10.68252     0.00809101 0.2014704  0.860522  0.967928

Using parallelization

并行处理

使用BiocParallel包

Parallelizing DESeq, results, and lfcShrink can be easily accomplished by loading the BiocParallel package, and then setting the following arguments: parallel=TRUE and BPPARAM=MulticoreParam(4), for example, splitting the job over 4 cores.

library("BiocParallel")  ##使用BiocParallel包
register(MulticoreParam(4))
#提前声明好后，直接添加parallel=TRUE选项即可

?results

?DESeq

?lfcShrink

p-values and adjusted p-values

P值调整

##p值从小到大排列
resOrdered <- res[order(res$pvalue),]

##使用summary函数获取简单的基本结果
summary(res)
## 
## out of 9921 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up)       : 518, 5.2%
## LFC < 0 (down)     : 536, 5.4%
## outliers [1]       : 1, 0.01%
## low counts [2]     : 1539, 16%
## (mean count < 6)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results

##p值小于0.1的共有多少个
sum(res$padj < 0.1, na.rm=TRUE)
##[1] 1048

如果想要提取排序后的差异分析结果可以这样操作：

resOrdered <- res[order(res$padj),]  #padj排序
resOrdered_frame=as.data.frame(resOrdered)
write.csv(resOrdered_frame， 
          file="condition_treated_results.csv")


##通过class函数看看数据类型
> class(resOrdered)
##[1] "DESeqResults"
##attr(,"package")
##[1] "DESeq2"

> class(resOrdered_frame)
##[1] "data.frame"

函数results有着丰富的参数设置，可以自定义生成的结果，默认p值=0.1，可使用alpha参数更改为合适的值。
例如：

res05 <- results(dds, alpha=0.05)
summary(res05)
## 
##out of 9921 with nonzero total read count
##adjusted p-value < 0.05
##LFC > 0 (up)       : 408, 4.1%
##LFC < 0 (down)     : 428, 4.3%
##outliers [1]       : 1, 0.01%
##low counts [2]     : 1154, 12%
##(mean count < 4)
##[1] see 'cooksCutoff' argument of ?results
##[2] see 'independentFiltering' argument of ?results

sum(res05$padj < 0.05, na.rm=TRUE)
## [1] 836

基本的数据处理到这里就完成了，接下来就是结果展示了。

下次见咯( ^ . ^ )

大家一起学习讨论鸭！

来一杯！