molecular biology

文献阅读1: 关于age的文献

2020-04-10  本文已影响0人  MJades

题目:Undulating changes in human plasma proteome profiles across the lifespan
主要是学习他的数据分析方式,所以不对introduction进行解读。

  1. 线性模型,以下这几张图:


    Fig 1. 线性回归ANOVA分析

    文中对这一部分的描述如下:


    Fig 2. method描述
    我利用自己的数据模仿进行了以上的分析,但没有绘图:
library(emmeans) # 计算 effect size
library(car) # 用于type2型ANOVA
attach(lmdataclin)
lm.function<-function(x){
  # calculate
  lm.pro<-lm(x~AGE+SEX,data=lmdataclin)
  x.age<-lm.pro$coefficients[2]
  y.age<-summary(lm.pro)$"coefficients"[2,4]
  x.sex<-lm.pro$coefficients[3]
  y.sex<-summary(lm.pro)$"coefficients"[3,4]
  mm<-Anova(lm.pro,type=2)
  eta_sq(mm,partial=TRUE)
  partial.age<-eta_sq(mm,partial=TRUE)[1,2]
  partial.sex<-eta_sq(mm,partial=TRUE)[2,2]
  lm.list<-data.frame(ß.age=x.age, pvalue.age=y.age, ß.sex=x.sex,pvalue.sex=y.sex,effect.size.age=partial.age,effect.size.sex=partial.sex)
  return(lm.list)
}
aov.pro<-apply(lmdataclin[,100:651],2,lm.function) # 批量处理
sum.aov<-do.call(rbind,lm.pro) # 进行合并

结果如下:


Fig 3. 结果展示

p.value的校正可参考:
https://blog.csdn.net/zhu_si_tao/article/details/71077703?depth_1-utm_source=distribute.pc_relevant.none-task&utm_source=distribute.pc_relevant.none-task
具体计算:

 p.adjust(sum.aov$pvalue.age,method="BH",n=length(sum.aov$pvalue.age)) # BH就是FDR校正
  1. Sliding Enrichment pathway analysis (SEPA)
    方法中对SEPA的介绍如下: SEPA介绍
    涉及到了GSEA的内容,可参考以下内容:
    https://www.jianshu.com/p/e28783bdd092
    https://www.jianshu.com/p/b409a5576ce1
    我下载了补充材料, 里面有2025个蛋白的信息,进行尝试。
    Method 1. clusterProfiler运行之后,和报道的结果不太一致;
Gene.back<-str_match(data$EntrezGeneSymbol,"[A-Z0-9]*")    %>% as.vector
Gene.back<-bitr(Gene.back, fromType = "SYMBOL",toType =    "ENTREZID",OrgDb = org.Hs.eg.db)
Gene.back<-Gene.back$ENTREZID
Gene.back[duplicated(Gene.back)]

data.rank.up<-data[data$Coefficient.Age>0,] %>% .[1:100,]
Gene.rank.up<-str_match(data.rank.up$EntrezGeneSymbol,"     [A-Z0-9]*") %>% as.vector
Gene.rank.up<-bitr(Gene.rank.up, fromType = "SYMBOL",toType = "ENTREZID",OrgDb = org.Hs.eg.db)
Gene.rank.up<-Gene.rank.up$ENTREZID
Gene.rank.up[duplicated(Gene.rank.up)]

go.up<- enrichGO(gene = Gene.rank.up,
               OrgDb=org.Hs.eg.db,
               ont = "ALL", # MF,BP,CC
               pAdjustMethod = "BH",
               minGSSize = 1,
               maxGSSize = 500,
               pvalueCutoff = 1,
               qvalueCutoff = 1,
               readable = TRUE,
               universe = Gene.back,
               keyType = "ENTREZID")
clusterProfiler

Method 2. gprofiler2,没有结果。

library(ggplot2)
library(gprofiler2)
Gene.rank.up.gp<-    str_match(data.rank.up$EntrezGeneSymbol,"[A-Z0-9]*")  %>% na.omit%>%unique %>%  as.vector
Gene.back.gp<-str_match(data$EntrezGeneSymbol,"[A-Z0-9]*") %>% na.omit%>%unique %>%  as.vector

# query type: Gene symbol
gostres <- gost(query=Gene.rank.up.gp,organism =  "hsapiens", ordered_query = TRUE, 
            multi_query = FALSE, significant = TRUE, exclude_iea = FALSE, 
            measure_underrepresentation = FALSE, evcodes = FALSE, 
            user_threshold = 0.05, correction_method = "fdr", 
            domain_scope = "annotated", #custom_bg = Gene.back.gp, 
            numeric_ns = "", sources = c("GO:BP") , as_short_link = FALSE)
p <- gostplot(gostres)
gprofiler2

暂时不知道问题出在哪,找到方法后再补充~

  1. Prediction of human biological age using the plasma proteome.


    Prediction method

    作者是用glmnet包做的,目前这个包已经更新,需要3.6.0版本以上的R才可以安装~


    glmnet

未完,待续.....

上一篇下一篇

猜你喜欢

热点阅读