跟着Genes|Genomes|Genetics学数据分析:R语
2022-07-03 本文已影响0人
小明的数据分析笔记本
论文
Sex-Specific Co-expression Networks and Sex-Biased Gene Expression in the Salmonid Brook Charr Salvelinus fontinalis
数据代码公开
https://github.com/bensutherland/sfon_wgcna
还有wgcna的代码,论文里对方法和结果部分介绍的还挺详细,可以对照着论文然后学习WGCNA的代码
今天的推文先学习差异表达分析的代码
论文中提供的原始count文件有100多个样本,数据量有点大。这里我只选择其中的20个样本。
读取表达量文件
library(readr)
my.counts<-read_csv("data/20220623/edgeR_counts.csv")
head(my.counts)
dim(my.counts)
对数据进行取整
library(tidyverse)
my.counts.round<- my.counts %>%
column_to_rownames("transcript.id") %>%
round()
dim(my.counts.round)
head(my.counts.round)
对数据进行过滤
这里的过滤标准我有点没看明白
library(edgeR)
edger.counts <- DGEList(counts = my.counts.round)
min.reads.mapping.per.transcript <- 10
cpm.filt <- min.reads.mapping.per.transcript / min(edger.counts$samples$lib.size) * 1000000
cpm.filt
min.ind <- 5
keep <- rowSums(cpm(edger.counts)>cpm.filt) >= min.ind
table(keep)
filtered.counts <- edger.counts[keep, , keep.lib.sizes=FALSE]
filtered.counts %>% class()
dim(filtered.counts)
filtered.counts <- calcNormFactors(filtered.counts, method = c("TMM"))
filtered.counts$samples
filtered.counts<-estimateDisp(filtered.counts)
将数据和样本信息结合
new.group.info<-read_csv("data/20220623/edgeR_group_info.csv")
identical(filtered.counts$samples %>% rownames(),
new.group.info$file.name)
new.group.info$sex<-factor(new.group.info$sex,
levels = c("F","M"))
levels(new.group.info$sex)
design <- model.matrix(~filtered.counts$samples$group)
design
colnames(design)[2] <- "sex"
差异表达分析
fit <- glmFit(y = filtered.counts, design = design)
lrt <- glmLRT(fit)
result <- topTags(lrt, n = 1000000)
火山图
result$table %>%
mutate(change = case_when(
PValue < 0.05 & logFC > 2 ~ "UP",
PValue < 0.05 & logFC < -2 ~ "DOWN",
TRUE ~ "NOT"
)) -> DEG
table(DEG$change)
library(ggplot2)
ggplot(data=DEG,aes(x=logFC,
y=-log10(PValue),
color=change))+
geom_point(alpha=0.8,size=3)+
labs(x="log2 fold change")+ ylab("-log10 pvalue")+
#ggtitle(this_title)+
theme_bw(base_size = 20)+
#theme(plot.title = element_text(size=15,hjust=0.5),)+
scale_color_manual(values=c('#a121f0','#bebebe','#ffad21')) -> p1
p1 +
geom_vline(xintercept = 2,lty="dashed")+
geom_vline(xintercept = -2,lty="dashed") -> p2
library(patchwork)
pdf(file = "edger_deg.pdf",
width = 9.4,height = 4,family = "serif")
p1+p2+
plot_layout(guides = "collect")
dev.off()
image.png
示例数据和代码可以在公众号后台回复20220625获取
欢迎大家关注我的公众号
小明的数据分析笔记本
小明的数据分析笔记本 公众号 主要分享:1、R语言和python做数据分析和数据可视化的简单小例子;2、园艺植物相关转录组学、基因组学、群体遗传学文献阅读笔记;3、生物信息学入门学习资料及自己的学习笔记!