《Learning R》笔记 Chapter 7 上 Strin

2018-02-11 本文已影响0人天火燎原天

构筑/连接

构筑一个string类型的vector，最常用的是c()函数。同时paste()函数也很常用。

paste (..., sep = " ", collapse = NULL)
paste0(..., collapse = NULL) #默认不间隔
# sep : a character string to separate the terms.
#collapse : an optional character string to separate the results
> paste(letters[2:6] , '1' , sep='$')
[1] "b$1" "c$1" "d$1" "e$1" "f$1"
> paste(letters[2:6] , '1' , sep='$' , collapse = '+')
[1] "b$1+c$1+d$1+e$1+f$1" #collapse使用的字符不会在最后出现
> paste(letters,sep = '',collapse = '')
[1] "abcdefghijklmnopqrstuvwxyz"
>paste(letters,letters,letters,sep = '+') #可以paste3个或3个以上的string
 [1] "a+a+a" "b+b+b" "c+c+c" "d+d+d" "e+e+e" "f+f+f" "g+g+g" "h+h+h" "i+i+i" "j+j+j" "k+k+k" "l+l+l"
[13] "m+m+m" "n+n+n" "o+o+o" "p+p+p" "q+q+q" "r+r+r" "s+s+s" "t+t+t" "u+u+u" "v+v+v" "w+w+w" "x+x+x"
[25] "y+y+y" "z+z+z"

toString()函数只接受一个string，输出一个string。新的string格式更漂亮。

 > toString(letters) # 默认使用逗号间隔
[1] "a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z"
> toString(letters , width = 20) #手动控制输出长度
[1] "a, b, c, d, e, f...."

cat()是R的一个底层函数。print()等众多函数都以其为基础。
noquote()是print()的一个wrapper，能打印出无双引号的字符串。

格式化/Formatting

R的底层是C，因此sprintf()函数能够按照C风格的通用格式来格式化字符串。formatC()函数也有类似功能。值得指出的是，这些R函数都是接受vector输入的。

> sprintf('the value of num %s is %.2f.' , 1:3 , rnorm(3))
[1] "the value of num 1 is -0.03." "the value of num 2 is -0.23." "the value of num 3 is -0.20."

format()的语法类似于formatC() 。它和prettyNum()一样能够输出格式更美观的string。

特殊字符

\t是制表符，\n是换行，\才是普通的‘\’。这一点类似于其他语言。但这些特殊符号不能在print()下实现（会被转换为普通字符），书里使用的是cat()函数。

大小写

toupper() 和 tolower() 无需多言

提取（extract）

提取单个字符串的一部分可以直接使用[]，但要提取一个character vector时，就要使用substr和substring两个函数。二者在大部分时间是相似的，他们的细微差别在于：

For substr, a character vector of the same length and with the same attributes as x;
For substring, a character vector of length the longest of the arguments.

substr(x, start, stop)
substring(text, first, last = 1000000L) 
#换言之，如果first或last里输入了比text还要长（length）的vector
#那么substring()的输出会比text长。

切分/paste的反向操作

使用strsplit()可以进行paste()的反向操作。需要注意的是，函数的输出是一个list！，即使x的长度为1！

strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)
#如果fixed=TRUE，则要求严格匹配split。否则split为正则表达式
> x='Thank you, voice of reason.'
> strsplit(x,',? ')
[[1]]
[1] "Thank"   "you"     "voice"   "of"      "reason."

文件路径

在R中最好使用 / 来描述文件路径。basename() 和 dirname()可以查看文件路径的不同部分。

> x <- 'E:/Steam/steamapps/common/Fallout New Vegas/nvse_loader.exe'
> basename(x) ; dirname(x)
[1] "nvse_loader.exe"
[1] "E:/Steam/steamapps/common/Fallout New Vegas"