R语言做生信分析方法BioStat

如何拿到 KEGG数据库的 hsa04650 Natural

2019-01-30  本文已影响26人  看远方的星

两种办法,第一谷歌,通过网页方式浏览得到,第二种办法,使用R包和代码来做。


第一种办法:网页浏览


1、谷歌直接搜索:hsa04650

image.png
2、点开此条网址(https://www.genome.jp/dbget-bin/www_bget?hsa04650
image.png
3、直接翻到gene这个条目下即可看到答案。
image.png

第二种方法:使用R包和代码:


思路:看一下网页答案可知,我们的目标是得到Gene条目形成的一个矩阵,并提取出第二列的基因(缩写)


image.png

参考文章: http://www.bio-info-trainee.com/3533.html
看一下这篇文章:

library(clusterProfiler)   #加载这个包,这个包有什么用呢?
# https://www.kegg.jp/dbget-bin/www_bget?pathway+hsa05169
# library(KEGG.db) library(KEGGREST)  #这两个包有什么用呢?
​
kg=download_KEGG('hsa')     #直接提取,并未提示用哪个命令获得。
head(kg[[1]])
head(kg[[2]])
ps=c('hsa04660','hsa04659',
     'hsa04658','hsa04657','hsa04662',
     'hsa04650')

确定方向,先安装包:


老规矩三部曲(安装bioconductor内的包):
1、source("http://bioconductor.org/biocLite.R")安装BiocInstaller

2、options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/") 切换镜像

3、BiocInstaller::biocLite('KEGGREST')安装bioconductor内的包(KEGGREST就是bioconductor的包)

> source("http://bioconductor.org/biocLite.R")
Bioconductor version 3.7 (BiocInstaller 1.30.0), ?biocLite for help
A newer version of Bioconductor is available for this version of R, ?BiocUpgrade for
  help
> options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/") 
> BiocInstaller::biocLite('KEGGREST')
BioC_mirror: http://mirrors.ustc.edu.cn/bioc/
Using Bioconductor 3.7 (BiocInstaller 1.30.0), R 3.5.2 (2018-12-20).
Installing package(s) ‘KEGGREST’
also installing the dependency ‘png’

trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.5/png_0.1-7.zip'
Content type 'application/zip' length 292639 bytes (285 KB)
downloaded 285 KB

trying URL 'http://mirrors.ustc.edu.cn/bioc//packages/3.7/bioc/bin/windows/contrib/3.5/KEGGREST_1.20.2.zip'
Content type 'application/zip' length 124626 bytes (121 KB)
downloaded 121 KB

package ‘png’ successfully unpacked and MD5 sums checked
package ‘KEGGREST’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\300S\AppData\Local\Temp\Rtmp4wKPRV\downloaded_packages
Old packages: 'gplots', 'purrr'
Update all/some/none? [a/s/n]: 
a
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.5/gplots_3.0.1.1.zip'
Content type 'application/zip' length 657011 bytes (641 KB)
downloaded 641 KB

trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.5/purrr_0.3.0.zip'
Content type 'application/zip' length 413820 bytes (404 KB)
downloaded 404 KB

package ‘gplots’ successfully unpacked and MD5 sums checked
package ‘purrr’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\300S\AppData\Local\Temp\Rtmp4wKPRV\downloaded_packages

了解包的使用:


命令:

> ?KEGGREST
No documentation for ‘KEGGREST’ in specified packages and libraries:
you could try ‘??KEGGREST’
> ??KEGGREST
image.png

点击查看,了解基本命令:


> gs<-keggGet('hsa04650')
> View(gs)
image.png 网页部分截图: image.png

目录和网页一样,但是可以明显看出gs目前不是矩阵。把其变成矩阵再提取出来即可。

image.png

光标放在目录旁,发现一个图标,点击出现一行代码,enter运行,得到该目录内容。


image.png

与网页对比正确:


image.png

参数x是要处理的字符串,
参数split是分割点。
参数fixed为TRUE时采用精确查找;
参数perl为TRUE时采用Perl正则表达式;
参数fixed和perl都为FALSE时,使用POSIX1003.2扩展正则表达式;
参数useBytes为TRUE时,匹配过程是逐字节进行的;

> lapply(a,function(x) strsplit(x,';'))
[[1]]
[[1]][[1]]
[1] "3105"


[[2]]
[[2]][[1]]
[1] "HLA-A"                                                    
[2] " major histocompatibility complex, class I, A [KO:K06751]"
...
> unlist(lapply(a,function(x) strsplit(x,';')[[1]][1]))
  [1] "3105"        "HLA-A"       "3106"        "HLA-B"       "3107"        "HLA-C"      
  [7] "3135"        "HLA-G"       "3133"        "HLA-E"       "3812"        "KIR3DL2"    
 [13] "3811"        "KIR3DL1"     "3803"        "KIR2DL2"     "3802"        "KIR2DL1"    

> b<- unlist(lapply(a,function(x) strsplit(x,';')[[1]][1]))
> b[1:length(b)%%2 ==0]  #length(b)为基因所在位置,取出位置为偶数的字符即基因名
  [1] "HLA-A"       "HLA-B"       "HLA-C"       "HLA-G"       "HLA-E"       "KIR3DL2"    
  [7] "KIR3DL1"     "KIR2DL2"     "KIR2DL1"     "KIR2DL3"     "KIR2DL4"     "KIR2DL5A"   
 [13] "KLRC1"       "KLRC2"       "KLRC3"       "KLRD1"       "PTPN6"       "PTPN11"     
 [19] "ICAM1"       "ICAM2"       "ITGAL"       "ITGB2"       "PTK2B"       "VAV3"       
 [25] "VAV1"        "VAV2"        "RAC1"        "RAC2"        "RAC3"        "PAK1"       
 [31] "MAP2K1"      "MAP2K2"      "MAPK1"       "MAPK3"       "TNF"         "CSF2"       
 [37] "IFNG"        "KIR2DS1"     "KIR2DS3"     "KIR2DS4"     "KIR2DS5"     "KIR2DS2"    
 [43] "NCR2"        "TYROBP"      "LCK"         "IGH"         "FCGR3A"      "FCGR3B"     
 [49] "NCR1"        "NCR3"        "FCER1G"      "CD247"       "ZAP70"       "SYK"        
 [55] "LCP2"        "LAT"         "PLCG1"       "PLCG2"       "SH3BP2"      "PIK3CA"     
 [61] "PIK3CD"      "PIK3CB"      "PIK3R1"      "PIK3R2"      "PIK3R3"      "FYN"        
 [67] "SHC1"        "SHC2"        "SHC3"        "SHC4"        "GRB2"        "SOS1"       
 [73] "SOS2"        "HRAS"        "KRAS"        "NRAS"        "ARAF"        "BRAF"       
 [79] "RAF1"        "MICB"        "MICA"        "ULBP1"       "ULBP2"       "ULBP3"      
 [85] "RAET1G"      "RAET1L"      "RAET1E"      "KLRK1"       "KLRC4-KLRK1" "HCST"       
 [91] "CD48"        "CD244"       "PPP3CA"      "PPP3CB"      "PPP3CC"      "PPP3R1"     
 [97] "PPP3R2"      "NFATC1"      "NFATC2"      "PRKCA"       "PRKCB"       "PRKCG"      
[103] "SH2D1B"      "SH2D1A"      "IFNGR1"      "IFNGR2"      "IFNA1"       "IFNA2"      
[109] "IFNA4"       "IFNA5"       "IFNA6"       "IFNA7"       "IFNA8"       "IFNA10"     
[115] "IFNA13"      "IFNA14"      "IFNA16"      "IFNA17"      "IFNA21"      "IFNB1"      
[121] "IFNAR1"      "IFNAR2"      "TNFSF10"     "TNFRSF10A"   "TNFRSF10B"   "FASLG"      
[127] "FAS"         "GZMB"        "PRF1"        "CASP3"       "BID"  
上一篇 下一篇

猜你喜欢

热点阅读