R语言学习笔记(9)-字符串
2021-01-22 本文已影响0人
Akuooo
参考视频:https://www.bilibili.com/video/BV19x411X7C6?t=3&p=25
一、
- 正则表达式
正则表达式.png正则表达式(regular expression)描述了一种字符串匹配的模式(pattern),可以用来检查一个串是否含有某种子串、将匹配的子串替换或者从某个串中取出符合某个条件的子串等。
- 处理字符串的函数
(出现字符串,一定要加引号!)
(1) nchar():统计字符串的长度
> nchar("Hello World")
[1] 11//空格也算字符串
> month.name
[1] "January" "February" "March" "April" "May" "June" "July" "August"
[9] "September" "October" "November" "December"
> nchar(month.name)
[1] 7 8 5 5 3 4 4 6 9 7 8 8
(2) length():返回元素中向量的个数
(3)paste():粘贴字符串
> paste(c("Everybody","loves","stats"))
[1] "Everybody" "loves" "stats"
> paste("Everybody","loves","stats")
[1] "Everybody loves stats"
> paste("Everybody","loves","stats",sep = "-")//sep设置分隔符
[1] "Everybody-loves-stats"
#向量和字符串连接,并不是将字符串添加到向量,而是向量中的元素分别与该字符串相连
> names <- c("Moe","Larry","Curly")
> paste(names,"loves stats")
[1] "Moe loves stats" "Larry loves stats" "Curly loves stats"
(4) substr():提取字符串
substr.png
substrstart.png
substrstop.png
substrx.png
> substr(x = month.name, start = 1, stop = 3)
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
> temp <- substr(x = month.name, start = 1, stop = 3)
> toupper(temp)//转化为大写
[1] "JAN" "FEB" "MAR" "APR" "MAY" "JUN" "JUL" "AUG" "SEP" "OCT" "NOV" "DEC"
> tolower(temp)//转化为小写
[1] "jan" "feb" "mar" "apr" "may" "jun" "jul" "aug" "sep" "oct" "nov" "dec"
(5)sub():一次替换;
gsub():全局替换
# ^:首字母;
# \\w:字符集简写,代表所有小写字符;
# \\U:转化为大写;
# 1:只转化一次
> gsub("^(\\w)","\\U\\1",tolower(temp))
[1] "Ujan" "Ufeb" "Umar" "Uapr" "Umay" "Ujun" "Ujul" "Uaug" "Usep" "Uoct" "Unov" "Udec"
#perl = T 支持Perl类型的表达式
> gsub("^(\\w)","\\U\\1",tolower(temp),perl = T)
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
#切换首字母小写,将U→L
> gsub("^(\\w)","\\L\\1",tolower(temp),perl = T)
[1] "jan" "feb" "mar" "apr" "may" "jun" "jul" "aug" "sep" "oct" "nov" "dec"
> gsub("^(\\w)","\\L\\1",toupper(temp),perl = T)
[1] "jAN" "fEB" "mAR" "aPR" "mAY" "jUN" "jUL" "aUG" "sEP" "oCT" "nOV" "dEC"
(6)grep():查找字符串
> x <- c("b","A+","AC")
> x
[1] "b" "A+" "AC"
> grep("A+",x)
[1] 2 3
> grep("A+",x,fixed = T)
[1] 2
> grep("A+",x,fixed = F)//此时+代表正则表达式,表示可以匹配1~正无穷个字符,故AC也满足条件
[1] 2 3
(7) match():进行字符串匹配,但是不支持正则表达式
> match("AC",x)
[1] 3
(8) strsplit():分割字符串,将长字符串分割为多份
> path <- "/usr/local/bin/R"
##strsplit返回的值是列表而非向量
> strsplit(path,"/")
[[1]]
[1] "" "usr" "local" "bin" "R"
strsplit(c(path,path),"/")
> strsplit(c(path,path),"/")
[[1]]
[1] "" "usr" "local" "bin" "R"
[[2]]
[1] "" "usr" "local" "bin" "R"
二、生成字符串成对组合的技巧
函数:outer()
> face <- 1:13
> suit<- c("spades","clubs","hearts","diamonds")
> outer(suit,face,FUN = paste)//也可加上sep,指定连接符号
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] "spades 1" "spades 2" "spades 3" "spades 4" "spades 5" "spades 6" "spades 7"
[2,] "clubs 1" "clubs 2" "clubs 3" "clubs 4" "clubs 5" "clubs 6" "clubs 7"
[3,] "hearts 1" "hearts 2" "hearts 3" "hearts 4" "hearts 5" "hearts 6" "hearts 7"
[4,] "diamonds 1" "diamonds 2" "diamonds 3" "diamonds 4" "diamonds 5" "diamonds 6" "diamonds 7"
[,8] [,9] [,10] [,11] [,12] [,13]
[1,] "spades 8" "spades 9" "spades 10" "spades 11" "spades 12" "spades 13"
[2,] "clubs 8" "clubs 9" "clubs 10" "clubs 11" "clubs 12" "clubs 13"
[3,] "hearts 8" "hearts 9" "hearts 10" "hearts 11" "hearts 12" "hearts 13"
[4,] "diamonds 8" "diamonds 9" "diamonds 10" "diamonds 11" "diamonds 12" "diamonds 13"