R语言学习笔记（9）-字符串

2021-01-22 本文已影响0人 Akuooo

参考视频：https://www.bilibili.com/video/BV19x411X7C6?t=3&p=25

一、

正则表达式

正则表达式(regular expression)描述了一种字符串匹配的模式（pattern），可以用来检查一个串是否含有某种子串、将匹配的子串替换或者从某个串中取出符合某个条件的子串等。

正则表达式.png

处理字符串的函数
（出现字符串，一定要加引号！）
(1) nchar():统计字符串的长度

> nchar("Hello World")
[1] 11//空格也算字符串
> month.name
 [1] "January"   "February"  "March"     "April"     "May"       "June"      "July"      "August"   
 [9] "September" "October"   "November"  "December" 
> nchar(month.name)
 [1] 7 8 5 5 3 4 4 6 9 7 8 8

(2) length():返回元素中向量的个数
(3)paste():粘贴字符串

> paste(c("Everybody","loves","stats"))
[1] "Everybody" "loves"     "stats"    
> paste("Everybody","loves","stats")
[1] "Everybody loves stats"
> paste("Everybody","loves","stats",sep = "-")//sep设置分隔符
[1] "Everybody-loves-stats"

#向量和字符串连接，并不是将字符串添加到向量，而是向量中的元素分别与该字符串相连
> names <- c("Moe","Larry","Curly")
> paste(names,"loves stats")
[1] "Moe loves stats"   "Larry loves stats" "Curly loves stats"

(4) substr():提取字符串

substr.png

substrstart.png

substrstop.png

substrx.png

> substr(x = month.name, start = 1, stop = 3)
 [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
> temp <- substr(x = month.name, start = 1, stop = 3)
> toupper(temp)//转化为大写
 [1] "JAN" "FEB" "MAR" "APR" "MAY" "JUN" "JUL" "AUG" "SEP" "OCT" "NOV" "DEC"
> tolower(temp)//转化为小写
 [1] "jan" "feb" "mar" "apr" "may" "jun" "jul" "aug" "sep" "oct" "nov" "dec"

(5)sub():一次替换；
gsub():全局替换

# ^:首字母；
# \\w:字符集简写，代表所有小写字符；
# \\U:转化为大写；
# 1:只转化一次
> gsub("^(\\w)","\\U\\1",tolower(temp))
 [1] "Ujan" "Ufeb" "Umar" "Uapr" "Umay" "Ujun" "Ujul" "Uaug" "Usep" "Uoct" "Unov" "Udec"
#perl = T 支持Perl类型的表达式
> gsub("^(\\w)","\\U\\1",tolower(temp),perl = T)
 [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
#切换首字母小写，将U→L
> gsub("^(\\w)","\\L\\1",tolower(temp),perl = T)
 [1] "jan" "feb" "mar" "apr" "may" "jun" "jul" "aug" "sep" "oct" "nov" "dec"
> gsub("^(\\w)","\\L\\1",toupper(temp),perl = T)
 [1] "jAN" "fEB" "mAR" "aPR" "mAY" "jUN" "jUL" "aUG" "sEP" "oCT" "nOV" "dEC"

(6)grep():查找字符串

> x <- c("b","A+","AC")
> x
[1] "b"  "A+" "AC"
> grep("A+",x)
[1] 2 3
> grep("A+",x,fixed = T)
[1] 2
> grep("A+",x,fixed = F)//此时+代表正则表达式，表示可以匹配1~正无穷个字符，故AC也满足条件
[1] 2 3

(7) match():进行字符串匹配，但是不支持正则表达式

> match("AC",x)
[1] 3

(8) strsplit():分割字符串，将长字符串分割为多份

> path <- "/usr/local/bin/R"
##strsplit返回的值是列表而非向量
> strsplit(path,"/")
[[1]]
[1] ""      "usr"   "local" "bin"   "R"    
strsplit(c(path,path),"/")
> strsplit(c(path,path),"/")
[[1]]
[1] ""      "usr"   "local" "bin"   "R"    

[[2]]
[1] ""      "usr"   "local" "bin"   "R"

二、生成字符串成对组合的技巧

函数：outer()

> face <- 1:13
> suit<- c("spades","clubs","hearts","diamonds")
> outer(suit,face,FUN = paste)//也可加上sep，指定连接符号
     [,1]         [,2]         [,3]         [,4]         [,5]         [,6]         [,7]        
[1,] "spades 1"   "spades 2"   "spades 3"   "spades 4"   "spades 5"   "spades 6"   "spades 7"  
[2,] "clubs 1"    "clubs 2"    "clubs 3"    "clubs 4"    "clubs 5"    "clubs 6"    "clubs 7"   
[3,] "hearts 1"   "hearts 2"   "hearts 3"   "hearts 4"   "hearts 5"   "hearts 6"   "hearts 7"  
[4,] "diamonds 1" "diamonds 2" "diamonds 3" "diamonds 4" "diamonds 5" "diamonds 6" "diamonds 7"
     [,8]         [,9]         [,10]         [,11]         [,12]         [,13]        
[1,] "spades 8"   "spades 9"   "spades 10"   "spades 11"   "spades 12"   "spades 13"  
[2,] "clubs 8"    "clubs 9"    "clubs 10"    "clubs 11"    "clubs 12"    "clubs 13"   
[3,] "hearts 8"   "hearts 9"   "hearts 10"   "hearts 11"   "hearts 12"   "hearts 13"  
[4,] "diamonds 8" "diamonds 9" "diamonds 10" "diamonds 11" "diamonds 12" "diamonds 13"

R语言学习笔记（9）-字符串

一、

二、生成字符串成对组合的技巧

猜你喜欢

热点阅读