ES 多字段特性及setting中自定义Analyzer

2020-02-02  本文已影响0人  鸿雁长飞光不度

1.多字段类型

1.多字段特性

PUT products
{
  "mappings": {
    "properties": {
      "company":{
        "type": "text",
        "fields": {
          "keyword":{
            "type":"keword",
            "ignore_above":256
          }
        }
      },
      "comment":{
        "type": "text",
        "english_comment":{
          "type":"text",
          "analyzer":"english",
          "search_analyzer":"english"
        }
      }
    }
  }
}

1.2精确值和全文本

2.自定义分词器

当ES自带的分词器不能满足需求的情况下,可以通过组合不同的Character Filters,Tokenizer,Token Filter来实现。详细内容
https://www.jianshu.com/p/4cff1721934a

2.1 Character Filters案例

//html过滤
POST _analyze
{
  "tokenizer": "keyword",
  "char_filter": ["html_strip"],
  "text":"<p>haha</p>"
}
//字符替换,-替换成_
POST _analyze
{
  "tokenizer": "standard",
  "char_filter": [
    {
      "type":"mapping",
      "mappings":["- => _"]
    }],
  "text":"010-123-1231"
}
//正则
POST _analyze
{
  "tokenizer": "standard",
  "char_filter": [
    {
      "type":"pattern_replace",
      "pattern":"http://(.*)",
      "replacement":"$1"
    }
    ],
    "text":"http://www.baidu.com"
}

2.2 Tokenizer 案例

按路径分割
POST _analyze
{
  "tokenizer": "path_hierarchy",
  "text":"a/b/c"
}

2.3Token Filter

按照空格分词后,转换为小写,去掉停用词输出。
GET _analyze
{
  "tokenizer": "whitespace",
  "filter":["lowercase","stop"],
  "text":["The girls in here is singing"]
}

2.4 在索引中自定义的完整案例

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
          "my_analyzer":{
          "type":"custom",
          "char_filter":["emoticons"],
          "tokenizer":"punctuation",
          "filter":["lowercase","english_stop"]
        }
      },
      "tokenizer":{
        "punctuation":{
          "type":"pattern",
          "pattern":"[.,!?]"
        }
      },
      "char_filter":{
        "emoticons":{
          "type":"mapping",
          "mappings":[
              ":) => happy",
              ":( => sad"
            ]
        }
      },
      "filter":{
        "english_stop":{
          "type":"stop",
          "stopwords":"_english_"
        }
      }
    }
  }
}

测试案例

POST my_index/_analyze
{
  "analyzer": "my_analyzer",
  "text": "I am :) person"
}
上一篇下一篇

猜你喜欢

热点阅读