GSE实战

(三)02_group_ids分组和芯片注释

2020-01-23  本文已影响0人  养猪场小老板

第一步:清除之前所有变量+加载之前的数据

> rm(list = ls())  #表示清除所有变量;ls当前目录赋值给列表,接着清除
#ls() 返回global environment 里面的所有object的名字。
#是一个character vector
> load(file = "step1output.Rdata")#加载工作目录下之前保存的数据
> library(stringr)#加载str包

第二步,确认分组的目标

#前文提到的pd中有临床信息,其中title中显示了control组和实验组
> pd$title
[1] "A375 cells 24h Control rep1"     "A375 cells 24h Control rep2"    
[3] "A375 cells 24h Control rep3"     "A375 cells 24h Vemurafenib rep1"
[5] "A375 cells 24h Vemurafenib rep2" "A375 cells 24h Vemurafenib rep3"
pd

第三步,分组向量生成

> group_list=c(rep("control",times=3),rep("treat",times=3))
> group_list
[1] "control" "control" "control" "treat"   "treat"   "treat"  
> #第三类,ifelse
> library(stringr)#这个包可以用函数str_detect()
> group_list=ifelse(str_detect(pd$title,"Control"),"control","treat")
> group_list
[1] "control" "control" "control" "treat"   "treat"   "treat"  
#第一个为判断条件,第二为true,第三false
#设置参考水平,对照在前,处理在后
#str_detect(string字符串, pattern匹配字符),返回逻辑值,是检测函数; 
#用于检测字符串中是否存在某种匹配模式;
#val <- c("abca4", 123, "cba2");str_detect(val, "a")检查Val是否有字符串a;TRUE FALSE TRUE
#pd$title中有6个,返回6个,TRUE返回第一个control;FALSE返回为treatment

第四步,设置因子

> group_list = factor(group_list,#生成因子的意义,后面的差异分析是处理/对照
                    levels = c("control","treat"))
#levels规定谁在前面谁是对照,注意顺序,所有加用level
#芯片注释,查找芯片平台对应的包,到此脚本中替换

芯片注释,查找芯片平台对应的包,到此脚本中替换
gpl #取网页搜索GPL编号,ctrl+F,获取相应的注释包
http://www.bio-info-trainee.com/1399.html


芯片探针与基因的对应关系http://www.bio-info-trainee.com/1399.html

image.png

第一步,安装并加载hugene10sttranscriptcluster.db包

> gpl #取网页搜索GPL编号,ctrl+F,获取相应的注释包
[1] "GPL6244"
>if(!require(hugene10sttranscriptcluster.db))BiocManager::install("hugene10sttranscriptcluster.db")
#require()表示加载,返回的是逻辑值,TRUE时表示已加载,FALSE表示未加载;!表示否定
#先安装;ls("package:tidyr")函数用法
> library(hugene10sttranscriptcluster.db)
> ls("package:hugene10sttranscriptcluster.db")#显示包里的所有目录
 [1] "hugene10sttranscriptcluster"             
 [2] "hugene10sttranscriptcluster.db"          
 [3] "hugene10sttranscriptcluster_dbconn"      
 [4] "hugene10sttranscriptcluster_dbfile"      
 [5] "hugene10sttranscriptcluster_dbInfo"      
 [6] "hugene10sttranscriptcluster_dbschema"    
 [7] "hugene10sttranscriptclusterACCNUM"       
 [8] "hugene10sttranscriptclusterALIAS2PROBE"  
 [9] "hugene10sttranscriptclusterCHR"          
[10] "hugene10sttranscriptclusterCHRLENGTHS"   
[11] "hugene10sttranscriptclusterCHRLOC"       
[12] "hugene10sttranscriptclusterCHRLOCEND"    
[13] "hugene10sttranscriptclusterENSEMBL"      
[14] "hugene10sttranscriptclusterENSEMBL2PROBE"
[15] "hugene10sttranscriptclusterENTREZID"     
[16] "hugene10sttranscriptclusterENZYME"       
[17] "hugene10sttranscriptclusterENZYME2PROBE" 
[18] "hugene10sttranscriptclusterGENENAME"     
[19] "hugene10sttranscriptclusterGO"           
[20] "hugene10sttranscriptclusterGO2ALLPROBES" 
[21] "hugene10sttranscriptclusterGO2PROBE"     
[22] "hugene10sttranscriptclusterMAP"          
[23] "hugene10sttranscriptclusterMAPCOUNTS"    
[24] "hugene10sttranscriptclusterOMIM"         
[25] "hugene10sttranscriptclusterORGANISM"     
[26] "hugene10sttranscriptclusterORGPKG"       
[27] "hugene10sttranscriptclusterPATH"         
[28] "hugene10sttranscriptclusterPATH2PROBE"   
[29] "hugene10sttranscriptclusterPFAM"         
[30] "hugene10sttranscriptclusterPMID"         
[31] "hugene10sttranscriptclusterPMID2PROBE"   
[32] "hugene10sttranscriptclusterPROSITE"      
[33] "hugene10sttranscriptclusterREFSEQ"       
[34] "hugene10sttranscriptclusterSYMBOL" ###重要
[35] "hugene10sttranscriptclusterUNIGENE"      
[36] "hugene10sttranscriptclusterUNIPROT"      
#View(hugene10sttranscriptclusterSYMBOL)
#str(hugene10sttranscriptclusterSYMBOL)
#View(hugene10sttranscriptclusterSYMBOL)

第二步,将hugene10sttranscriptclusterSYMBOL中的数据用数据框封装

> ids <- toTable(hugene10sttranscriptclusterSYMBOL)#把包里的数据变成数据框
#toTable是一种能够以数据框的形式来操作一个Bimap对象的方法,
#也就是把Bimap对象转换为一个数据框,
#这些方法是Bimap interface方法的一部分。
#Bimap指的是一种映射关系,例如探针的编号与基因名称之间的映射
head(ids)#只有两列数据probe_id和symbol
  probe_id    symbol
1  7896759 LINC01128
2  7896761    SAMD11
3  7896779    KLHL17
4  7896798   PLEKHN1
5  7896817     ISG15
6  7896822      AGRN
#View(ids)
save(exp,group_list,ids,file = "step2output.Rdata")

继续了解probe_id和symbol在该分析中的作用

上一篇下一篇

猜你喜欢

热点阅读