DESeq2跑差异基因

2021-03-05 本文已影响0人 Aji

20210305 16:36
最近在处理single cell 数据，老板要求用DESeq2跑差异基因，Seurat中自带的DESeq2跑出来的有很多是0,很奇怪，也不懂是啥原因，没有往下面细细研究，后来师姐用DESeq2直接跑，就是把每一个细胞当成一个样本来跑的，细胞数目少的时候跑的时间用的少，但是细胞数目多的时候，时间就会用的多。之前都是换一组数据，就得写个代码，很麻烦。今天给它包装了下，跑起来方便多了，很开心。最近的粉丝涨了三个，开心开心，虽然离100的粉丝量还差好多，但是是在进步的就好。

1.函数

#!/~/bin/Rscript
## 第一行是Linux上Rscript上所在位置

# Function: This is a pipeline to find DEGs using DESeq2
# Input: 1.rds odject which includes counts variable and label variable; 2. outputname to save the DEGs
# Output: csv of DESeq2 DEGs
# Version: 2021-03-05, Yiyi Zheng

library(optparse)
library(getopt)
option_list <- list(
  make_option(c("-o", "--output"), type = "character", default = FALSE,
              action = "store", help = "This is the output directory."
  ),
  make_option(c("--rds_name"), type = "character", default = FALSE,
              action = "store", help = "This is the name of rds to input."
  ),
  make_option(c("--output_name"), type = "character", default = FALSE,
              action = "store", help = "This is the name of ouput file."
  )
)

# -help 
opt = parse_args(OptionParser(option_list = option_list, 
                              usage = "This is a script to find DEGs using DESeq2"))
print(opt)

# set the output path
library(DESeq2)
DESeq2_Object <- readRDS(opt$rds_name)

setwd(opt$output)
timestart<-Sys.time() 
print("The time begining:")
print(timestart)

##  Main function 
print("The level and number of input data:")
# The levels of condition need to assign by yourself. DESeq2 will use the first level as the condtion.
#condition<-factor(c(rep("Control",7104), rep("Obesity",6965)), levels = c("Control","Obesity"))

condition <- factor(DESeq2_Object$label)
print(levels(condition))
print(table(condition))

print("The contrl is")
print(table(condition)[1])
print("The treat is")
print(table(condition)[2])

count <- DESeq2_Object$counts
count <- as.matrix(count) # Need to transform to matrix.

coldata <- data.frame(row.names=colnames(count), condition)                    
count_2 <- count[which(rowSums(count) > 0), ] # Exclude the gene which is 0 in all cells.
count <-  count_2+1
dds <- DESeqDataSetFromMatrix(count,coldata,design=~condition)
dds <- DESeq(dds)
res <- results(dds)
outputname <- opt$output_name

print("Finished to calculate DEGs and now writing the DEGs results.")
write.csv(res, outputname)

timeend<-Sys.time()
print("The time ending:")
print(timeend)
runningtime<-timeend-timestart
print("The time DESeq2 used is ")
print(runningtime)

2.使用

该函数的使用就是在Linux上直接输入参数就可以了，但是之前你需要构建一个rds object, 其中两个变量是counts和label，构建如下：

rm(list = ls())
list.files()
library(Seurat)
Female_exp  <- readRDS("20210305_Female_DESeq2_Exp.rds")
all <- list()
all$counts <- Female_exp$counts
all$label <- Female_exp$sample_label
print(table(all$label))
## DESeq2中会把第一个level 作为control，第二个level作为treat.这个需要确认好，是谁对谁找的差异基因
saveRDS(all , "Female_allcells.rds")

然后就可以挂在后台上跑了

i=Female_allcells.rds
nohup RunDESeq2_Linux.R -o . --rds_name $i --output_name $i.csv > $i.output.file 2>&1 &

进一步有一步的欢喜！！
今天是前进的一天

DESeq2跑差异基因

1.函数

2.使用

猜你喜欢

热点阅读