R语言制作词云

2020-06-12  本文已影响0人  山竹山竹px

安装R包

jiebaR,jiebaRD :分词

wordcloud2 : 生成词云

BiocManager::install("jiebaR")
BiocManager::install("jiebaRD")
BiocManager::install("wordcloud2")
library(jiebaR,jiebaRD) 
library(wordcloud2)

安装成功

> library(jiebaR,jiebaRD)
载入需要的程辑包:jiebaRD
Warning messages:
1: 程辑包‘jiebaR’是用R版本3.6.3 来建造的 
2: 程辑包‘jiebaRD’是用R版本3.6.3 来建造的 
> library(wordcloud2)
Warning message:
程辑包‘wordcloud2’是用R版本3.6.3 来建造的

处理数据

语法

调用jiebaR库的 worker()函数,进行分词

参数如下

 worker(type = "mix", dict = DICTPATH, hmm = HMMPATH,
  user = USERPATH, idf = IDFPATH, stop_word = STOPPATH, write = T,
  qmax = 20, topn = 5, encoding = "UTF-8", detect = T,
  symbol = F, lines = 1e+05, output = NULL, bylines = F,
  user_weight = "max")
自定义词典

防止出现把”大数据“,分成 “大” “数据”

使用上述同样的方法,新建 stop_word字典,将stop_word 修改不可行。会报错——

There is no such file for stop words.

下文另有方法过滤 停止词

导入数据

csv表格数据读取

sm_total <- read.csv("文件名.csv",,stringsAsFactors = F) #读入

title <- sm_total$Title  #提取需要的数据

小插曲

read.csv记得把参数 stringsAsFactors 选上false,否则,data的类型是factor

后续使用segment会报错——Error in segment(data, wk) : Argument 'code' must be an string.

分词

使用 segment()

wk <- worker(user = 'SM_dict.utf8') #自定义词典

sm_seg <- segment(title,wk)  # 分词语法的一种
去掉停止词

即去除无意义的"a" "and" "the" "of" ……

使用 filter_segment

#设置需要过滤的词
filter <- c("a","an","is","was","are","been","and","or","as","its","of","for","by","in","on","from","the")

sm_seg_filter <- filter_segment(sm_seg,filter)
绘制词云

计算词频

word_frequency <- table(sm_seg_filter)
结果会是 一个单词,下面对应它的出现次数 这样的

因为我的文件很大,所以,我选择取词频前100的词来生成词云

先排序 sort()

freq_sort <- sort(word_frequency,decreasing = T)

head(freq_sort) #查看前6个

绘制

语法

wordcloud2(data, size = 1, minSize = 0, gridSize =  0,
    fontFamily = 'Segoe UI', fontWeight = 'bold',
    color = 'random-dark', backgroundColor = "white",
    minRotation = -pi/4, maxRotation = pi/4, shuffle = TRUE,
    rotateRatio = 0.4, shape = 'circle', ellipticity = 0.65,
    widgetsize = NULL, figPath = NULL, hoverFunction = NULL)
wordcloud2(head(freq_sort,100),color = "random-light",minRotation = pi/6,maxRotation = pi/6,rotateRatio = 1)

效果

截取一半的词云效果图.jpg

Ref.
https://blog.csdn.net/snowdroptulip/article/details/78836941
https://www.jianshu.com/p/a4ba7637680c

上一篇 下一篇

猜你喜欢

热点阅读