Elasticsearch安装以及配置hanlp中文分词插件
一、运行环境
- 操作系统:CentOS 6.8
- Elasticsearch版本:5.6.3
- hanlp版本:1.5.2
二、安装步骤
-
下载tar包
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.3.tar.gz
解压到/home/elasticsearch - 修改elasticsearch.yml
cluster.name: xingdu #设置集群名称
node.name: search #设置节点名称
network.bind_host: 192.168.1.200 #设置绑定的ip地址
network.publish_host: 192.168.1.200 #设置其它节点和该节点交互的ip地址
elasticsearch默认开启两个端口:9200,用于ES节点和外部通讯;9300,用于ES节点之间通讯
- 运行
./bin/elasticsearch
报错:org.elasticsearch.bootstrap.StartupException: java.lang.RuntimeException: can not run elasticsearch as root,不能以root用户的身份运行elasticsearch
解决方法:新建用户组和用户,并赋予其elasticsearch文件夹的权限
groupadd xingdu
useradd xingdu -g xingdu
chown -R xingdu:xingdu elasticsearch
再次运行
WARN:java.lang.UnsupportedOperationException: seccomp unavailable: CONFIG_SECCOMP not compiled into kernel, CONFIG_SECCOMP and CONFIG_SECCOMP_FILTER are needed,使用新的linux版本即可,不影响使用
ERROR: [4] bootstrap checks failed
[1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
[2]: max number of threads [1024] for user [xingdu] is too low, increase to at least [2048]
[3]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[4]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
启动失败,出现上面四个错误,一一解决
ERROR[1]:无法创建本地文件问题,用户最大可创建文件数太小
解决方法:切换到root用户,编辑limits.conf配置文件
vi /etc/security/limits.conf
#末尾添加
* soft nofile 65536
* hard nofile 65536
* soft nproc 65536
* hard nproc 65536
ERROR[2]:无法创建本地线程问题,用户最大可创建线程数太小
解决方法:切换到root用户,进入limits.d目录下,修改90-nproc.conf 配置文件
vi /etc/security/limits.d/90-nproc.conf
将
* soft nproc 1024
修改为
* soft nproc 2048
ERROR[3]:最大虚拟内存太小
解决方法:切换到root用户下,修改配置文件sysctl.conf
vi /etc/sysctl.conf
添加配置
vm.max_map_count=655360
执行命令
sysctl -p
ERROR[4]:这是在因为Centos6不支持SecComp,而ES5.2.0默认bootstrap.system_call_filter为true进行检测,所以导致检测失败,失败后直接导致ES不能启动
解决方法:在elasticsearch.yml中配置bootstrap.system_call_filter为false
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
解决完成后,重新启动,启动成功
如果要在后台运行,使用./bin/elasticsearch -d
启动
访问 http://192.168.1.200:9200/,返回数据
{
"name": "search",
"cluster_name": "xingdu",
"cluster_uuid": "CYmFU0UQTeW3zejFmsOdRw",
"version": {
"number": "5.6.3",
"build_hash": "1a2f265",
"build_date": "2017-10-06T20:33:39.012Z",
"build_snapshot": false,
"lucene_version": "6.6.1"
},
"tagline": "You Know, for Search"
}
-
安装hanlp中文分词插件
参考插件地址: https://github.com/hualongdata/hanlp-ext/tree/master/es-plugin
该插件支持hanlp和hanlp-index分词,但是没有过滤停用词(也可能是需要额外配置),修改源代码之后重新编译打包
链接: https://pan.baidu.com/s/1pMdfzkB 密码: cdc4
将压缩包解压到/elasticsearch/plugins
目录下,重命名为hanlp
下载hanlp数据包
链接: https://pan.baidu.com/s/1smsAxch 密码: w4i7
解压之后,修改/elasticsearch/plugins/hanlp
目录下的hanlp.properties文件,修改根路径root=/home/hanlp/
为数据包所在目录
给运行elasticsearch的用户分配权限
chown -R xingdu:xingdu hanlp
现在启动elasticsearch,提示jar包冲突
jar1: /home/elasticsearch/lib/log4j-api-2.9.1.jar
jar2: /home/elasticsearch/plugins/hanlp/log4j-api-2.9.1.jar
把hanlp目录下的log4j-api-2.9.1.jar删除即可
现在启动elasticsearch,没有异常,我们测试一下hanlp分词插件有没有生效
访问:http://192.168.1.200:9200/_analyze?text=%E4%B8%AD%E5%9B%BD%E7%9A%84%E5%86%9B%E4%BA%8B%E5%AE%9E%E5%8A%9B%E4%B8%8E%E6%97%A5%E4%BF%B1%E5%A2%9E&analyzer=hanlp
elasticsearch报错并且退出运行
java.security.AccessControlException: access denied ("java.util.PropertyPermission" "*" "read,write")
看上去是权限的问题,这就用到hanlp目录下的plugin-security.policy
文件,修改/elasticsearch/config/jvm.options
文件,在末尾添加
-Djava.security.policy=/home/elasticsearch/plugins/hanlp/plugin-security.policy
继续测试上面的链接,提示找不到hanlp.properties,hanlp无法加载词典,elasticsearch退出运行,解决方法:修改/elasticsearch/bin/elasticsearch.in.sh
文件,将ES_CLASSPATH修改为:
ES_CLASSPATH="$ES_HOME/lib/*:$ES_HOME/plugins/hanlp/"
继续测试上面的链接,访问成功,返回结果
{
"tokens": [
{
"token": "中国",
"start_offset": 0,
"end_offset": 2,
"type": "ns",
"position": 0
},
{
"token": "军事",
"start_offset": 3,
"end_offset": 5,
"type": "n",
"position": 1
},
{
"token": "实力",
"start_offset": 5,
"end_offset": 7,
"type": "n",
"position": 2
},
{
"token": "与日俱增",
"start_offset": 7,
"end_offset": 11,
"type": "vl",
"position": 3
}
]
}
可以看到分词效果比较好,而且停用词也被过滤了,大功告成!