架构社区程序员

elasticsearch分词器

2022-10-25  本文已影响0人  慕凌峰

一、es内置分词器

只支持英文分词,不支持中文分词

2、es内置分词器

3、内置分词器用例

二、ik分词器

1、ik分词器安装

主要用于中文分词,英文也支持

https://github.com/medcl/elasticsearch-analysis-ik

2、分词器

3、用例

4、ik_max_worik_smart分词器的区别

5、ik自定义词汇

小小小
小小少年
测测
子天
{
    "analyzer": "ik_max_word",
    "text": "小小小少年测测想成为天子的儿子天下无敌。"
}
{
    "tokens": [
        {
            "token": "小小小",
            "start_offset": 0,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "小小",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "小小少年",
            "start_offset": 1,
            "end_offset": 5,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "小小",
            "start_offset": 1,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 3
        },
        {
            "token": "少年",
            "start_offset": 3,
            "end_offset": 5,
            "type": "CN_WORD",
            "position": 4
        },
        {
            "token": "测测",
            "start_offset": 5,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 5
        },
        {
            "token": "想成",
            "start_offset": 7,
            "end_offset": 9,
            "type": "CN_WORD",
            "position": 6
        },
        {
            "token": "成为",
            "start_offset": 8,
            "end_offset": 10,
            "type": "CN_WORD",
            "position": 7
        },
        {
            "token": "天子",
            "start_offset": 10,
            "end_offset": 12,
            "type": "CN_WORD",
            "position": 8
        },
        {
            "token": "的",
            "start_offset": 12,
            "end_offset": 13,
            "type": "CN_CHAR",
            "position": 9
        },
        {
            "token": "儿子",
            "start_offset": 13,
            "end_offset": 15,
            "type": "CN_WORD",
            "position": 10
        },
        {
            "token": "子天",
            "start_offset": 14,
            "end_offset": 16,
            "type": "CN_WORD",
            "position": 11
        },
        {
            "token": "天下无敌",
            "start_offset": 15,
            "end_offset": 19,
            "type": "CN_WORD",
            "position": 12
        },
        {
            "token": "天下",
            "start_offset": 15,
            "end_offset": 17,
            "type": "CN_WORD",
            "position": 13
        },
        {
            "token": "无敌",
            "start_offset": 17,
            "end_offset": 19,
            "type": "CN_WORD",
            "position": 14
        }
    ]
}
上一篇 下一篇

猜你喜欢

热点阅读