R for data scienceR 语言 生信分析

[读书笔记r4ds]19 Functions

2019-11-20  本文已影响0人  茶思饭

III.Program 编程技巧

19 Functions

When should you write a function?

-当需要多次使用相同的代码时,应该考虑写function。
-写function的3个关键步骤:

This is an important part of the “do not repeat yourself” (or DRY) principle.

写函数而不是复制、粘贴有3大好处:

19.2.1 Practice

  1. Why is TRUE not a parameter to rescale01()? What would happen if x contained a single missing value, and na.rm was FALSE?
    TRUE 这个参数没必要改变,所以不是parameter。没结果。
  2. In the second variant of rescale01(), infinite values are left unchanged. Rewrite rescale01()so that -Inf is mapped to 0, and Inf is mapped to 1.
rescale02 <- function(x) {
  x[x==Inf] <- 1
  x[x==-Inf] <- 0
  rng <- range(x, na.rm = TRUE)
  (x - rng[1]) / (rng[2] - rng[1])
}
  1. Practice turning the following code snippets into functions. Think about what each function does. What would you call it? How many arguments does it need? Can you rewrite it to be more expressive or less duplicative?

    mean(is.na(x))
    
    x / sum(x, na.rm = TRUE)
    
    sd(x, na.rm = TRUE) / mean(x, na.rm = TRUE)
    
  2. Follow http://nicercode.github.io/intro/writing-functions.html to write your own functions to compute the variance and skew of a numeric vector.

  3. Write both_na(), a function that takes two vectors of the same length and returns the number of positions that have an NA in both vectors.

both_na <- function(x,y){
  position <- ""
  if(length(x)==length(y)){
    is_x <- is.na(x)
    is_y <- is.na(y)
    len <- length(x)
    for(i in 1:len){
      if(is_x[i]==T &is_y[i]==T){
        position=c(position,i)}
    }
    if(length(position)>1){ 
      position=position[-1]
    }
  }else{
    print("Length is not equal.")
  }
  position
}
  1. What do the following functions do? Why are they useful even though they are so short?

    is_directory <- function(x) file.info(x)$isdir
    is_readable <- function(x) file.access(x, 4) == 0
    
  2. Read the complete lyrics to “Little Bunny Foo Foo”. There’s a lot of duplication in this song. Extend the initial piping example to recreate the complete song, and use functions to reduce the duplication.

19.3 Functions are for humans and computers函数的可读性

Tips:The name of a function

19.3.1 Exercises

  1. Read the source code for each of the following three functions, puzzle out what they do, and then brainstorm better names.
### 判断是否是字符串的前缀是否正确
f1 <- function(string, prefix) {
  substr(string, 1, nchar(prefix)) == prefix
}
### 删除向量的最后一个单位
f2 <- function(x) {
  if (length(x) <= 1) return(NULL)
  x[-length(x)]
}
### 重复y字符以x的长度
f3 <- function(x, y) {
  rep(y, length.out = length(x))
}

f1: prefix_check
f2: vector_del
f3: rep_as_length

  1. Take a function that you’ve written recently and spend 5 minutes brainstorming a better name for it and its arguments.

  2. Compare and contrast rnorm() and MASS::mvrnorm(). How could you make them more consistent?
    norm_r and norm_mvr

  3. Make a case for why norm_r(), norm_d() etc would be better than rnorm(), dnorm(). Make a case for the opposite.

19.4 Conditional execution 条件判断

if (condition) {
  # code executed when condition is TRUE
} else {
  # code executed when condition is FALSE
}
if (this) {
  # do that
} else if (that) {
  # do something else
} else {
  # 
}
#> function(x, y, op) {
#>   switch(op,
#>     plus = x + y,
#>     minus = x - y,
#>     times = x * y,
#>     divide = x / y,
#>     stop("Unknown op!")
#>   )
#> }

19.4.3 Code style

19.4.4 Exercises

  1. What’s the difference between if and ifelse()? Carefully read the help and construct three examples that illustrate the key differences.
    1) ifelse 必定返回一个值,不能返回向量。if条件判断后,可以返回向量,可以不返回任何值。
    2)if 可以进行多重条件判断, ifelse 只能进行T/F 判断。

  2. Write a greeting function that says “good morning”, “good afternoon”, or “good evening”, depending on the time of day. (Hint: use a time argument that defaults to lubridate::now(). That will make it easier to test your function.)

greeting<- function(){
  now <- lubridate::now() %>% hour()
    if(now<12&&now>=5){
      print("Good morning!")
    }else if(now>=12 &&now<18){
      print("Good afternoon!")
    }else{
      print("Good evening!")
    }
}
  1. Implement a fizzbuzz function. It takes a single number as input. If the number is divisible by three, it returns “fizz”. If it’s divisible by five it returns “buzz”. If it’s divisible by three and five, it returns “fizzbuzz”. Otherwise, it returns the number. Make sure you first write working code before you create the function.
fizzbuzz <- function(x){
  if (x%%3==0&& x%%5==0){
    "fizzbuzz"
  } else if(x%%3==0&& x%%5!=0){
    "fizz"
  } else if(x%%3!=0&& x%%5==0){
    "buzz"
  } else{x}
}
### 使用switch()
fizzbuzz2 <- function(x){
  a <- "a"
  if (x%%3==0) {a <- paste0(a,"b")}
  if (x%%5==0) {a <- paste0(a,"c")}
  switch(a,
          a = x,
         ab = "fizz",
         ac = "bizz",
         abc= "fizzbizz")
}
  1. How could you use cut() to simplify this set of nested if-else statements?
if (temp <= 0) {
  "freezing"
} else if (temp <= 10) {
  "cold"
} else if (temp <= 20) {
  "cool"
} else if (temp <= 30) {
  "warm"
} else {
  "hot"
}
##使用cut() 和switch()
if (temp <= 0) {  
  "freezing"
} else if(temp>=0&& temp<=30){
  c <- cut(temp,breaks=c(0,10,20,30)) %>% as.integer
  switch(c,"cold","cool","warm")
} else {
  "hot"
}

How would you change the call to cut() if I’d used < instead of <=? What is the other chief advantage of cut() for this problem? (Hint: what happens if you have many values in temp?)

##使用right=FALSE参数,切断部分包含左边界,不包含右边界
if (temp <=0) {  
  "freezing"
} else if(temp>=0&& temp<=30){
  c <- cut(temp,breaks=c(0,10,20,30), right=FALSE) %>% as.integer
  switch(c,"cold","cool","warm")
} else {
  "hot"
}
  1. What happens if you use switch() with numeric values?
    可以不用‘=’制定,按数字顺序识别,后续操作。
  2. What does this switch() call do? What happens if x is “e”?
    Experiment, then carefully read the documentation.
switch(x, 
  a = ,
  b = "ab",
  c = ,
  d = "cd",
)

Nothing happend!

19.5 Function arguments 函数的参数

参数主要有两种作用:

data 类参数放在最前面,details参数放后面,并且最好有默认值。

19.5.1 Choosing names

19.5.2 Checking values

19.5.3 Dot-dot-dot (…)

x <- c(1, 2)
sum(x, na.mr = TRUE)
#> [1] 4
### 你看出错误是怎么产生的吗?
## na.rm 参数被写成了na.mr

19.5.4 Lazy evaluation

19.5.5 Exercises

  1. What does commas(letters, collapse = "-") do? Why?
commas(letters, collapse = "-")
# Error in stringr::str_c(..., collapse = "- ") : 
##   formal argument "collapse" matched by multiple actual arguments

因为在之前,设置commas 函数时,已经设定过collapse = ", "的参数,再次设定collapse = "- ",则collapse参数出现了多个匹配项,导致报错。
解决方法:

commas <- function(...) stringr::str_c(...)
commas(letters, collapse="-") 
[1] "a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z"

Notes: 如果str_c()设置了collapse = ", "的默认值,commas函数对collapse 默认值的修改,无法传递给str_c()

commas <- function(...,collaspe=",") stringr::str_c(..., collapse = ", ")
> commas(letters)
[1] "a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z"
> commas(letters,collaspe = "-")
[1] "a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z"

需要一个中间变量传递

commas <- function(...,collaspe=",") {
  a <- collaspe
  stringr::str_c(..., collapse = a)
}
> commas(letters,collaspe = "-")
[1] "a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z"
  1. It’d be nice if you could supply multiple characters to the pad argument, e.g. rule("Title", pad = "-+"). Why doesn’t this currently work? How could you fix it?
rule <- function(..., pad = "-") {
  title <- paste0(...)
  width <- getOption("width") - nchar(title) - 5
  cat(title, " ", stringr::str_dup(pad, width%/%str_length(pad)), "\n", sep = "")
}
rule("Important output",pad="+-")
Important output +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
  1. What does the trim argument to mean() do? When might you use it?
    trim在计算平均值之前,从x的两端总共截断分数(0-0.5)倍数量的观测值。trim值之外范围的值被认为是最接近的终点。
    使用trim 值进行计算的平均值,称为:截断均值。在统计学里面一般是去除最高端的5%和最低端的5%。当然为了满足不同的需求,不一定是5%,但是一般都是高端和低端同时去除同样比例的数据。
    目的主要是为了避免部分极高值和极低值对于数据整体均值的影响,从而使平均值对整体更加有代表性。
    最典型的例子是:奥运会上,体操运动员的得分,要将所有裁判的打分,去掉1个最高分,1个最低分,其余的平均值及为运动员的最后得分。
  2. The default value for the method argument to cor() is c("pearson", "kendall", "spearman"). What does that mean? What value is used by default?

19.6 Return values 返回值

函数的返回值,是你创建函数的目的。需要考虑2个问题:

  1. 提前返回值是否使函数更容易读?
  2. 能否让函数通过管道符传递?
    19.6.1 Explicit return statements
f <- function() {
  if (!x) {
    return(something_short)
  }

  # Do 
  # something
  # that
  # takes
  # many
  # lines
  # to
  # express
}

19.6.2 Writing pipeable functions

19.7 Environment

上一篇下一篇

猜你喜欢

热点阅读