ELK stackElastic Stack

Elasticsearch安装以及配置hanlp中文分词插件

2018-02-02  本文已影响251人  Carlyle1993

一、运行环境

  1. 操作系统:CentOS 6.8
  2. Elasticsearch版本:5.6.3
  3. hanlp版本:1.5.2

二、安装步骤

  1. 下载tar包
    https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.3.tar.gz
    解压到/home/elasticsearch
  2. 修改elasticsearch.yml
cluster.name: xingdu                #设置集群名称
node.name: search                   #设置节点名称
network.bind_host: 192.168.1.200    #设置绑定的ip地址
network.publish_host: 192.168.1.200 #设置其它节点和该节点交互的ip地址

elasticsearch默认开启两个端口:9200,用于ES节点和外部通讯;9300,用于ES节点之间通讯

  1. 运行
./bin/elasticsearch

报错:org.elasticsearch.bootstrap.StartupException: java.lang.RuntimeException: can not run elasticsearch as root,不能以root用户的身份运行elasticsearch
解决方法:新建用户组和用户,并赋予其elasticsearch文件夹的权限

groupadd xingdu
useradd xingdu -g xingdu
chown -R xingdu:xingdu elasticsearch 

再次运行
WARN:java.lang.UnsupportedOperationException: seccomp unavailable: CONFIG_SECCOMP not compiled into kernel, CONFIG_SECCOMP and CONFIG_SECCOMP_FILTER are needed,使用新的linux版本即可,不影响使用

ERROR: [4] bootstrap checks failed
[1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
[2]: max number of threads [1024] for user [xingdu] is too low, increase to at least [2048]
[3]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[4]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk

启动失败,出现上面四个错误,一一解决
ERROR[1]:无法创建本地文件问题,用户最大可创建文件数太小
解决方法:切换到root用户,编辑limits.conf配置文件

vi /etc/security/limits.conf
#末尾添加
* soft nofile 65536
* hard nofile 65536
* soft nproc 65536
* hard nproc 65536

ERROR[2]:无法创建本地线程问题,用户最大可创建线程数太小
解决方法:切换到root用户,进入limits.d目录下,修改90-nproc.conf 配置文件

vi /etc/security/limits.d/90-nproc.conf
将
 * soft nproc 1024
修改为
 * soft nproc 2048

ERROR[3]:最大虚拟内存太小
解决方法:切换到root用户下,修改配置文件sysctl.conf

vi /etc/sysctl.conf
添加配置
vm.max_map_count=655360
执行命令
sysctl -p

ERROR[4]:这是在因为Centos6不支持SecComp,而ES5.2.0默认bootstrap.system_call_filter为true进行检测,所以导致检测失败,失败后直接导致ES不能启动
解决方法:在elasticsearch.yml中配置bootstrap.system_call_filter为false

bootstrap.memory_lock: false
bootstrap.system_call_filter: false

解决完成后,重新启动,启动成功
如果要在后台运行,使用./bin/elasticsearch -d启动
访问 http://192.168.1.200:9200/,返回数据

{
    "name": "search",
    "cluster_name": "xingdu",
    "cluster_uuid": "CYmFU0UQTeW3zejFmsOdRw",
    "version": {
        "number": "5.6.3",
        "build_hash": "1a2f265",
        "build_date": "2017-10-06T20:33:39.012Z",
        "build_snapshot": false,
        "lucene_version": "6.6.1"
    },
    "tagline": "You Know, for Search"
}
  1. 安装hanlp中文分词插件
    参考插件地址: https://github.com/hualongdata/hanlp-ext/tree/master/es-plugin
    该插件支持hanlp和hanlp-index分词,但是没有过滤停用词(也可能是需要额外配置),修改源代码之后重新编译打包
    链接: https://pan.baidu.com/s/1pMdfzkB 密码: cdc4
    将压缩包解压到/elasticsearch/plugins目录下,重命名为hanlp
    下载hanlp数据包
    链接: https://pan.baidu.com/s/1smsAxch 密码: w4i7
    解压之后,修改/elasticsearch/plugins/hanlp目录下的hanlp.properties文件,修改根路径root=/home/hanlp/为数据包所在目录
    给运行elasticsearch的用户分配权限
chown -R xingdu:xingdu hanlp

现在启动elasticsearch,提示jar包冲突

jar1: /home/elasticsearch/lib/log4j-api-2.9.1.jar
jar2: /home/elasticsearch/plugins/hanlp/log4j-api-2.9.1.jar
把hanlp目录下的log4j-api-2.9.1.jar删除即可

现在启动elasticsearch,没有异常,我们测试一下hanlp分词插件有没有生效
访问:http://192.168.1.200:9200/_analyze?text=%E4%B8%AD%E5%9B%BD%E7%9A%84%E5%86%9B%E4%BA%8B%E5%AE%9E%E5%8A%9B%E4%B8%8E%E6%97%A5%E4%BF%B1%E5%A2%9E&analyzer=hanlp
elasticsearch报错并且退出运行

java.security.AccessControlException: access denied ("java.util.PropertyPermission" "*" "read,write")

看上去是权限的问题,这就用到hanlp目录下的plugin-security.policy文件,修改/elasticsearch/config/jvm.options文件,在末尾添加

-Djava.security.policy=/home/elasticsearch/plugins/hanlp/plugin-security.policy

继续测试上面的链接,提示找不到hanlp.properties,hanlp无法加载词典,elasticsearch退出运行,解决方法:修改/elasticsearch/bin/elasticsearch.in.sh文件,将ES_CLASSPATH修改为:

ES_CLASSPATH="$ES_HOME/lib/*:$ES_HOME/plugins/hanlp/"

继续测试上面的链接,访问成功,返回结果

{
    "tokens": [
        {
            "token": "中国",
            "start_offset": 0,
            "end_offset": 2,
            "type": "ns",
            "position": 0
        },
        {
            "token": "军事",
            "start_offset": 3,
            "end_offset": 5,
            "type": "n",
            "position": 1
        },
        {
            "token": "实力",
            "start_offset": 5,
            "end_offset": 7,
            "type": "n",
            "position": 2
        },
        {
            "token": "与日俱增",
            "start_offset": 7,
            "end_offset": 11,
            "type": "vl",
            "position": 3
        }
    ]
}

可以看到分词效果比较好,而且停用词也被过滤了,大功告成!

上一篇下一篇

猜你喜欢

热点阅读