[R]《R包开发》读书笔记

2018-11-16 本文已影响25人郑宝童

曾师兄曾经推荐过hadley的“R package”这本在线电子书，当时我就很喜欢。后来接触了Y叔的clusterprofiler,被这个包的功能所震撼，于是乎萌生了写R包的想法。并着手开始翻译“R package”这本书，计划着翻译出版(2017年萌生想法)，后来发现已经有一本《R 包开发》的书（R package 的中文译文）。本着不重复造轮子的原则，我毫不犹豫的买了这本书。之后花了几天就看完这本《R 包开发》，不得不赞叹一下杨学辉老师翻译水准高出我几条街。

英文版详细教程是：http://r-pkgs.had.co.nz/r.html

question :为什么我不怎么看R官方的 “writing r extension”？
- answer：我也试图看过，确实如《R 包开发》书中的序言叙述，官方追求严谨正确，牺牲了一部分可读性。
question:为什么要写R包？
- answer:谢益辉老师在序文中也提到，就算你的代码不是开发给别人使用的，用R包的方式组织代码也有
  无穷的益处，在码农的世界里，“别人”这个词有不一样含义，很多时候那个“别人”就是三个月后的自己。

以下记录我阅读《R包开发》这本书简单记录下的读书笔记：
我的环境： R3.4.3 Rstudio win7

step1：

1.1安装必要的包

install.packages(c("devtools", "roxygen2", "testthat", "knitr"))

注：如果你想使用hadly老师devtools包的最新功能，你可以使用 devtools::install_github("hadly/devtools")

1.2安装与R版本对应的Rtool https://cran.rstudio.com/bin/windows/Rtools/

step 2：用 Rstudio 创建项目（包名建议小写）

2.1点击file-->new project
2.2选择new directory
2.3点击R package
2.4输入包名，点击create project

文件及目录解释：

DESCRIPTION文件: 包描述文件
NAMESPACE文件: 包的命名空间文件
man/: 存放函数的说明文件的目录
R/：在这个目录底下存放你编写的R代码man/sayHello.Rd: sayHello函数的说明文件，latex语法，用来生成PDF文档

step3 在R/目录底下书写代码时的注意事项

3.1 R 代码建议使用formatR这个R包来格式化一下，使得代码更加美观
3.2 在代码中，注意不要用library()或者require()，因为它会改变搜索路径，影响了全局环境下的可用函数。如果你
的代码需要依赖一些其他包，更好的方式是在DESCRIPTION中指定你包的需求（DESCRIPTION 中的Import 、
Suggests），DESCRIPTION里的Import 、Suggests不要手动写，尽量使用roxygen2来自动生成
3.3 不要使用source 从文件加载代码，你可以使用devtools::load_all() 来加载代码,它会自动source R目录下的所有
代码
快捷键： ctrl+shift+L 保存所有打开的文件

step4: 如何写DESCRIPTION （DESCRIPTION是描述文件，作用是存储包中重要的元数据，比如包名、标题、版本

信息、作者、包维护者、包的依赖情况、许可证）

devtools::use_packages("包名") #它自动添加描述文件的框架

例子:示例code 来源于Y叔github https://github.com/GuangchuangYu/clusterProfiler

Package: clusterProfiler #包名
Type: Package
Title: statistical analysis and visualization of functional profiles for genes and gene clusters #标题
Version: 3.5.3 #版本 版本号 <主版本>.<次版本>.<补丁版本>
Authors@R: c(person(given = "Guangchuang", family = "Yu",
email = "guangchuangyu@gmail.com",
role = c("aut", "cre")), ##cre创建者或者维护者 aut作者
person(given = "Li-Gen", family = "Wang",
email = "reeganwang020@gmail.com",
role = "ctb"), #ctb贡献者，比如提供了一些补丁的人
person(given = "Giovanni", family = "Dall'Olio",
email = "giovanni.dallolio@upf.edu",
role = "ctb", comment = "formula interface of compareCluster")
)
Maintainer: Guangchuang Yu <guangchuangyu@gmail.com> #维护者
Description: This package implements methods to analyze and visualize functional profiles (GO and KEGG) of gene and gene clusters. #对包的描述，通常比标题详细
Depends:
R (>= 3.3.1),
DOSE (>= 3.3.0)
Imports: ##Imports Suggests由devtools::use_packages("包名")来生成，不是手动写，大于号这些
代表有版本的要求，如 输入devtools::use_packages("dplyr") 会在描述里面自动增加Import
#devtools::use_packages("dplyr","Suggests")会在描述里面自动增加Suggests
AnnotationDbi,
ggplot2,
GO.db,
GOSemSim (>= 2.0.0),
magrittr,
methods,
plyr,
qvalue,
rvcheck,
stats,
stats4,
tidyr,
utils
Suggests:
AnnotationHub,
GSEABase,
KEGG.db,
knitr,
org.Hs.eg.db,
prettydoc,
pathview,
ReactomePA,
testthat,
topGO
VignetteBuilder: knitr
ByteCompile: true
License: Artistic-2.0 # 许可证 MIT GPL-2 CC0 .....等，这里是你对你包的授权，比如CC0代表完全放弃了你对该代码的所有权利，其他licence自行查阅https://cran.r-project.org/doc/manuals/Rexts.html#Licensing
URL: https://guangchuangyu.github.io/clusterProfiler #url
BugReports: https://github.com/GuangchuangYu/clusterProfiler/issues #提交bug的网络地址
Packaged: NA
biocViews: Annotation, Clustering, GeneSetEnrichment, GO, KEGG,MultipleComparison, Pathways, Reactome, Visualization
RoxygenNote: 5.0.1

step5 man/ 怎么写？(man目录底下存放的是函数的帮助信息，就是你"? 函数名"时所看到的帮助信息，是以
xxxx.rd这种格式来组织的)
man中的文件建议使用roxygen2来写，不要直接写latex，(latex较复杂，入手不是那么容易，而且写起来比较麻烦)你可以直接在你的R代码上方书写你的roxygen2，然后 Ctrl+shift+D （devtools::document()）它会在man底下自动生成一个.rd结尾的文档
例子：Y叔R包clusterProfiler 里的R/bitr.R

##' list ID types supported by annoDb #文档的标题，这是当你使用help(package=包名)时显示在帮助文档顶部的标题
##'
##'
##' @title idType #标题
##' @param OrgDb annotation db ##这部分常用于参数的注释 OrgDb是参数 annotation db是参数的描述
##' @return character vector #返回值描述
##' @importFrom GOSemSim load_OrgDb #这是导入GOSemSim包里的load_OrgDb函数，后文load_OrgDb函数就可以直接用了
##' @importFrom AnnotationDbi keytypes
##' @export #这里代表你的函数是否允许被调用
##' @author Guangchuang Yu
idType <- function(OrgDb = "org.Hs.eg.db") {
db <- load_OrgDb(OrgDb)
keytypes(db)
} #
#' Biological Id TRanslator
##'
##'
##' @title bitr
##' @param geneID input gene id
##' @param fromType input id type
##' @param toType output id type
##' @param OrgDb annotation db
##' @param drop drop NA or not
##' @return data.frame
##' @importFrom magrittr %>%
##' @importFrom magrittr %<>%
##' @importFrom AnnotationDbi select
##' @export
##' @author Guangchuang Yu
bitr <- function(geneID, fromType, toType, OrgDb, drop=TRUE) {
idTypes <- idType(OrgDb)
msg <- paste0("should be one of ", paste(idTypes, collapse=", "), ".")
if (! fromType %in% idTypes) {
stop("'fromType' ", msg)
}
 if (! all(toType %in% idTypes)) {
stop("'toType' ", msg)
} 
geneID %<>% as.character %>% unique
db <- load_OrgDb(OrgDb)
res <- suppressWarnings(select(db,
keys = geneID,
keytype = fromType,
columns=c(fromType, toType)))
ii <- which(is.na(res[,2]))
if (length(ii)) {
n <- res[ii, 1] %>% unique %>% length
if (n) {
warning(paste0(round(n/length(geneID)*100, 2), "%"), " of input gene IDs are fail to
map...")
}
 if (drop) {
res <- res[-ii, ]
}
}
return(res)} #
#' convert biological ID using KEGG API
##'
##'
##' @title bitr_kegg
##' @param geneID input gene id
##' @param fromType input id type
##' @param toType output id type
##' @param organism supported organism, can be search using search_kegg_organism function
##' @param drop drop NA or not
##' @return data.frame
##' @export
##' @author Guangchuang Yu
bitr_kegg <- function(geneID, fromType, toType, organism, drop=TRUE) {
id_types <- c("Path", "Module", "ncbi-proteinid", "ncbi-geneid", "uniprot", "kegg")
fromType <- match.arg(fromType, id_types)
toType <- match.arg(toType, id_types)
if (fromType == toType)
stop("fromType and toType should not be identical...")
if (fromType == "Path" || fromType == "Module") {
idconv <- KEGG_path2extid(geneID, organism, fromType, toType)
} else if (toType == "Path" || toType == "Module") {
idconv <- KEGG_extid2path(geneID, organism, toType, fromType)
} else {
idconv <- KEGG_convert(fromType, toType, organism)
} 
res <- idconv[idconv[,1] %in% geneID, ]
n <- sum(!geneID %in% res[,1])
if (n > 0) {
warning(paste0(round(n/length(geneID)*100, 2), "%"), " of input gene IDs are fail to
map...")
} 
if (! drop && n > 0) {
misHit <- data.frame(from = geneID[!geneID %in% res[,1]],
to = NA)
res <- rbind(res, misHit)
} 
colnames(res) <- c(fromType, toType)
rownames(res) <- NULL
return(res)
} 
KEGG_convert <- function(fromType, toType, species) {
if (fromType == "kegg" || toType != "kegg") {
turl <- paste("http://rest.kegg.jp/conv", toType, species, sep='/')
tidconv <- kegg_rest(turl)
if (is.null(tidconv))
stop(toType, " is not supported for ", species, " ...")idconv <- tidconv
} 
if (toType == "kegg" || fromType != "kegg") {
furl <- paste("http://rest.kegg.jp/conv", fromType, species, sep='/')
fidconv <- kegg_rest(furl)
if (is.null(fidconv))
stop(fromType, " is not supported for ", species, " ...")
idconv <- fidconv
} 
if (fromType != "kegg" && toType != "kegg") {
idconv <- merge(fidconv, tidconv, by.x='from', by.y='from')
idconv <- idconv[, -1]
} else if (fromType != "kegg") {
idconv <- idconv[, c(2,1)]
} 
colnames(idconv) <- c("from", "to")
idconv[,1] %<>% gsub("[^:]+:", "", .)
idconv[,2] %<>% gsub("[^:]+:", "", .)
return(idconv)
} 
##' query all genes in a KEGG pathway or module
##'
##'
##' @title KEGG_path2extid
##' @param keggID KEGG ID, path or module ID
##' @param species species
##' @param keggType one of 'Path' or 'Module'
##' @param keyType KEGG gene type, one of "ncbi-proteinid", "ncbi-geneid", "uniprot", or "kegg"
##' @return extid vector
##' @author guangchuang yu
KEGG_path2extid <- function(keggID, species=sub("\\d+$", "", keggID),
keggType = "Path", keyType = "kegg") {
path2extid <- KEGGPATHID2EXTID(species, keggType, keyType)
path2extid[path2extid$from %in% keggID, ]
} K
EGG_extid2path <- function(geneID, species, keggType = "Path", keyType = "kegg") {
path2extid <- KEGGPATHID2EXTID(species, keggType, keyType)
res <- path2extid[path2extid$to %in% geneID, ]
res <- res[, c(2,1)]
colnames(res) <- colnames(path2extid)
return(res)
} 
KEGGPATHID2EXTID <- function(species, keggType = "Path", keyType = "kegg") {
keggType <- match.arg(keggType, c("Path", "Module"))
if (keggType == "Path") {
keggType <- "KEGG"} else {
keggType <- "MKEGG"
} k
egg <- download_KEGG(species, keggType, keyType)
return(kegg$KEGGPATHID2EXTID)
}

step6如何写长篇文档 Vignettes

为什么要写长篇文档？
因为这个比之前的man/帮助文档更加具有可读性，而且书写起来也很简单。man里面存储的是一个个函数的
帮助信息，一般阅读者难以将这些信息串联成一个体系，光光有man帮助信息，不利于读者对包的功能进行
整体把握。而Vignettes这种长篇的帮助文档，看起来像阅读一本书一样，整体性，较利于对包的全局理解。

如何查看长篇文档
browseVignettes()或者browseVignettes(包名)

devtools::use_vignette("名字")
#这将创建vignettes/目录
#会在DESCRIPTION中添加必要的依赖
#会在目录下生成 名字.Rmd
#接下来你就直接使用rmarkdown 语法书写帮助文档，就像写作文一样简单 （rmarkdown语法很简单，初学者五分钟左右就可以掌握这门语法）

写完.Rmd，你可以按ctrl +shift+K或者单击 Knit这个来预览效果。

step7
如何写命名空间？NAMESPACE

这部分不要直接手动写，建议用roxygen2来生成用到
##' @importFrom AnnotationDbi select
##' @export
这种语法来生成，也就是说，直接在你的R代码上输入需要的包和函数

生成的例子

# Generated by roxygen2: do not edit by hand
S3method("[",compareClusterResult)
S3method("[[",compareClusterResult)
S3method(as.data.frame,compareClusterResult)
S3method(dim,compareClusterResult)
S3method(fortify,compareClusterResult)
S3method(geneID,groupGOResult)
S3method(geneInCategory,groupGOResult)
S3method(head,compareClusterResult)
S3method(tail,compareClusterResult)
export(GSEA)
export(Gff2GeneTable)
export(bitr)
export(bitr_kegg)
export(browseKEGG)
export(buildGOmap)
export(compareCluster)
export(download_KEGG)
export(dropGO)
export(enrichDAVID)
export(enrichGO)
export(enrichKEGG)
export(enrichMKEGG)
export(enricher)
export(go2ont)
export(go2term)
export(gofilter)
export(groupGO)
export(gseGO)
export(gseKEGG)
export(gseMKEGG)
export(idType)
export(merge_result)
export(plot)
export(plotGOgraph)
export(read.gmt)
export(search_kegg_organism)
exportClasses(compareClusterResult)
exportClasses(groupGOResult)
exportMethods(dotplot)
exportMethods(simplify)
importClassesFrom(DOSE,enrichResult)
importClassesFrom(DOSE,gseaResult)
importClassesFrom(methods,data.frame)
importFrom(AnnotationDbi,Ontology)
importFrom(AnnotationDbi,as.list)
importFrom(AnnotationDbi,keys)
importFrom(AnnotationDbi,keytypes)
importFrom(AnnotationDbi,select)
importFrom(AnnotationDbi,toTable)
importFrom(DOSE,dotplot)
importFrom(DOSE,geneID)
importFrom(DOSE,geneInCategory)
importFrom(DOSE,setReadable)
importFrom(DOSE,theme_dose)
importFrom(GO.db,GOBPANCESTOR)
importFrom(GO.db,GOBPCHILDREN)
importFrom(GO.db,GOCCANCESTOR)
importFrom(GO.db,GOCCCHILDREN)
importFrom(GO.db,GOMFANCESTOR)
importFrom(GO.db,GOMFCHILDREN)
importFrom(GO.db,GOTERM)
importFrom(GOSemSim,godata)
importFrom(GOSemSim,load_OrgDb)
importFrom(GOSemSim,mgoSim)
importFrom(ggplot2,"%+%")
importFrom(ggplot2,aes)
importFrom(ggplot2,aes_)
importFrom(ggplot2,aes_string)
importFrom(ggplot2,coord_flip)
importFrom(ggplot2,element_text)
importFrom(ggplot2,fortify)
importFrom(ggplot2,geom_bar)
importFrom(ggplot2,geom_point)
importFrom(ggplot2,ggplot)
importFrom(ggplot2,ggtitle)
importFrom(ggplot2,scale_colour_gradient)
importFrom(ggplot2,theme)
importFrom(ggplot2,theme_bw)
importFrom(ggplot2,xlab)
importFrom(ggplot2,ylab)
importFrom(magrittr,"%<>%")
importFrom(magrittr,"%>%")
importFrom(methods,new)
importFrom(plyr,.)
importFrom(plyr,ddply)
importFrom(plyr,dlply)
importFrom(plyr,ldply)
importFrom(plyr,llply)
importFrom(plyr,mdply)
importFrom(plyr,rename)
importFrom(qvalue,qvalue)
importFrom(rvcheck,get_fun_from_pkg)
importFrom(stats,formula)
importFrom(stats,setNames)
importFrom(stats4,plot)
importFrom(tidyr,gather)
importFrom(utils,browseURL)
importFrom(utils,citation)
importFrom(utils,head)
importFrom(utils,installed.packages)
importFrom(utils,packageDescription)
importFrom(utils,read.table)
importFrom(utils,stack)
importFrom(utils,str)
importFrom(utils,tail)
importMethodsFrom(AnnotationDbi,mappedkeys)
importMethodsFrom(AnnotationDbi,mget)
importMethodsFrom(DOSE,plot)
importMethodsFrom(DOSE,show)
importMethodsFrom(DOSE,summary)

step 8 如何在R包中包含外部数据？
为什么要用外部数据？
因为外部数据可以被用来让用户测试你的数据
建议：DESCRIPTION中的LazyData: TRUR
这个可以用dectools::create()来实现

#代码
devtools::use_data(数据名)
#多套数据用逗号隔开,它会在/data 底下自动生成一个

以上只是粗略的总结，想要了解详细的信息，请访问：http://r-pkgs.had.co.nz/intro.html

简书上其他关于R包开发的文章：