good code rna_seq走进转录组

RNA-seq下游分析(3)-rawcount转换为TPM及RP

2020-12-11  本文已影响0人  灵活胖子的进步之路

因为我用的GTF文件较新,为realease101版本,因此需要自己生成转换文件需要的TxDb文件,首先需要安装GenomicFeatures包,然后直接在线应用命令进行数据库建立TxDb文件,因为我的表达矩阵的行名为ensemble,因此我直接从官网得到数据库文件

library(GenomicFeatures)#载入包
txdb <- makeTxDbFromEnsembl(organism="Homo sapiens",#设定种族
                    release=101,设定版本
                    server="ensembldb.ensembl.org")#审定服务器地址
saveDb(txdb, file='txdbensemble101.sqlite')#保存生存的TXDB文件

后续转换的时候可以直接应用Github上别人写的包,我直接下载。下载的地址为https://codeload.github.com/mapawlak/normalizeGeneCounts/zip/master
下载下来后解压,本地直接source加载后就可以了

TxDb <- loadDb(file='txdbensemble101.sqlite')#载入参考数据集
counts <- read.table("all.id.txt",#读入原始counts文件
                       header = T,#第一行为列名
                       row.names = 1,#第一列为行名
                       check.names = F)#保持原文名称
RPKM <- normalizeGeneCounts(counts, TxDb, method = "RPKM")
source("normalizeGeneCounts.R")#载入本地脚本

#以下分别生成校准后文件
RPKM <- normalizeGeneCounts(counts, TxDb, method = "RPKM")

TPM <- normalizeGeneCounts(counts, TxDb, method = "TPM")

CPM <- normalizeGeneCounts(counts, TxDb, method = "CPM")

save(CPM,RPKM,TPM,file = "normal.Rdata")#保持到本地

以下命令是normalizeGeneCounts作者的说明文件中的命令

# normalizeGeneCounts

## Description
Function takes raw counts of NGS data and normalize to [CPM, RPKM or TPM](https://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/).

## Usage
normalizeGeneCounts(counts, TxDb, method)

## Arguments

*counts* A data frame with gene ID in rownames and sample ID in colnames

*TxDb* `GenomicFeatures` object created from the same gtf as counts object

*method* Should be "CPM", "RPKM" or "TPM"

## Example

CPM <- normalizeCounts(counts, TxDb, method = "CPM")

以下命令是作者R文件的原始代码

normalizeGeneCounts <- function(counts, TxDb, method) {
  require("GenomicFeatures")
  length <- width(genes(TxDb))/10 ^ 3 #gene length per kb
  if (method == "CPM") {
    normalized <-
      as.data.frame(apply(counts,2, function(x) (x/sum(x)) * 10 ^ 6))
  } else if (method == "RPKM") {
    normalized <-
      as.data.frame(t(apply(counts, 1, "/", colSums(counts) / 10 ^ 6)) / length)
  } else if (method == "TPM") {
    normalized <-
      as.data.frame(t(apply(
        counts / length, 1, "/", colSums(counts / length) / 10 ^ 6
        )))
  } else {
    stop("method must be CPM, RPKM or TPM")
  }
    return(normalized)
}
上一篇下一篇

猜你喜欢

热点阅读