Python_ jieba、snownlp中文分词、Pinyin

2020-04-25 本文已影响0人 Koelre

一、安装

pip install jieba
pip install snownlp # 使用这个，建议使用Python3环境
pip install pypinyin

分词：

jieba分词

# jieba分词

>>> import jieba
>>> text = "我说我应该好好学习"
>>> cutafter = list(jieba.cut(text))
Building prefix dict from the default dictionary ...
Dumping model to file cache c:\users\ztdn00\appdata\local\temp\jieba.cache
Loading model cost 5.820 seconds.
Prefix dict has been built succesfully.
>>> print cutafter
[u'\u6211', u'\u8bf4', u'\u6211', u'\u5e94\u8be5', u'\u597d\u597d\u5b66\u4e60']
>>> for t in cutafter:
    print t 
我
说
我
应该
好好学习
>>>

snownlp 分词，Python3的环境下可以正常分词

# snownlp 分词
>>> import snownlp
>>> t = "我说我应该好好学习"
>>> sn = snownlp.SnowNLP(t).words
>>> print(sn)
['我', '说', '我', '应该', '好好', '学习']
>>>

Python2 环境下是酱紫的：

>>> import snownlp
>>> t = "我说我应该好好学习"
>>> print snownlp.SnowNLP(t).words
['\xce\xd2\xcb\xb5\xce\xd2\xd3\xa6\xb8\xc3\xba\xc3\xba\xc3\xd1\xa7\xcf\xb0']
>>>

可以看出分词是没成功的哈

https://blog.csdn.net/qq_35038153/article/details/78771251

https://www.cnblogs.com/cmnz/p/6963850.html

Python_ jieba、snownlp中文分词、Pinyin

猜你喜欢

热点阅读