NLP学习笔记之基础技能
2019-01-02 本文已影响0人
半笔闪
一、字符串操作
1、去空格及特殊符号
s = ' hello, world!'
print(s.strip()) #hello, world!
print(s.lstrip(' hello, ')) #world!
print(s.rstrip('!')) # hello, world
2、连接字符串
s1 = 'hello'
s2 = 'world'
s = s1 + s2
print(s) #helloworld
3、查找字符串
s1 = 'hello'
s2 = 'e'
print(s1.index(s2)) #1
4、比较字符串
###如使用python2,可直接使用cmp()函数
import operator
s1 = 'hello'
s2 = 'hell'
#相当于a == b
print(operator.eq(s1,s2)) #False
#相当于a < b
print(operator.lt(s1,s2)) #False
#相当于a <= b
print(operator.le(s1,s2)) #False
#相当于a > b
print(operator.gt(s1,s2)) #True
#相当于a >= b
print(operator.ge(s1,s2)) #True
#相当于a != b
print(operator.ne(s1,s2)) #True
5、字符串中的大小写转换
s1 = 'Hello'
print(s1.upper()) #HELLO
print(s1.lower()) #hello
6、翻转字符串
s1 = 'hello'
print(s1[::-1]) #olleh
7、查找字符串
s1 = 'hello'
s2 = 'el'
print(s1.find(s2)) #1
8、分割字符串
s1 = 'I, want, to, say, hello, world'
s2 = ','
print(s1.split(s2)) #['I', ' want', ' to', ' say', ' hello', ' world']
9、计算字符串中出现频次最多的字母
import re
from collections import Counter
def get_max_frequency_char(text):
text = text.lower()
result = re.findall('[a-zA-Z]',text)
count = Counter(result)
m = max(count.values())
return sorted([x for (x, y) in count.items() if y == m])[0]
二、正则表达式