二、Elasticsearch的term filter来搜索数据

2017-07-13 本文已影响375人编程界的小学生

1、根据用户ID、是否隐藏、帖子ID、发帖日期来搜索帖子

2、完成需求

（1）插入一些测试帖子的数据

{ "index": { "_id": 1 }}
{ "articleID" : "XHDK-A-1293-#fJ3", "userID" : 1, "hidden": false, "postDate": "2017-01-01" }
{ "index": { "_id": 2 }}
{ "articleID" : "KDKE-B-9947-#kL5", "userID" : 1, "hidden": false, "postDate": "2017-01-02" }
{ "index": { "_id": 3 }}
{ "articleID" : "JODL-X-1937-#pV7", "userID" : 2, "hidden": false, "postDate": "2017-01-01" }
{ "index": { "_id": 4 }}
{ "articleID" : "QQPX-R-3956-#aD8", "userID" : 2, "hidden": true, "postDate": "2017-01-02" }

（2）查看生成的mapping
GET /forum/_mapping/article
返回结果

{
  "forum": {
    "mappings": {
      "article": {
        "properties": {
          "articleID": {
            "type": "keyword"
          },
          "hidden": {
            "type": "boolean"
          },
          "postDate": {
            "type": "date"
          },
          "userID": {
            "type": "long"
          }
        }
      }
    }
  }
}

现在ES 5.2版本，type=text，默认会设置两个field，一个是field本身，比如articleID就是分词的；还有一个就是field.keyword（这里是articleID.keyword），这个字段默认是不分词的，并且最多保留256字符

（3）根据用户ID搜索帖子

GET /forum/article/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "userID": "1"
        }
      }
    }
  }
}

term filter/query：对搜索文本不分词，直接拿去倒排索引中匹配，你输入的是什么，就去匹配什么。比如说，如果对搜索文本进行分词的话，“hello world” --》“hello”和“world”，两个词分别取倒排索引中匹配term。而不分词的话“hello world” --》“hello world”直接拿到倒排索引中去匹配“hello world”

（4）搜索没有隐藏的帖子

GET /forum/article/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "hidden": "false"
        }
      }
    }
  }
}

（5）根据帖子日期搜索帖子

GET /forum/article/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "postDate": "2017-01-01"
        }
      }
    }
  }
}

（6）根据帖子ID搜索帖子

GET /forum/article/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "articleID": "XHDK-A-1293-#fJ3"
        }
      }
    }
  }
}

返回结果

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

结果为空。因为articleID.keyword，是ES最新版本内置建立的field，就是不分词的。所以一个articleID过来的时候，会建立两次索引。一次是自己本身（articleID），是要分词的，分词后放入倒排索引；另一次是基于articleID.keyword，不分词，最多保留256字符，直接一个完整的字符串放入倒排索引中。

所以term filter，对text过滤，可以考虑使用内置的field.keyword来进行匹配。但是有个问题，默认就保留256字符，所以尽可能还是自己去手动建立索引，指定not_analyzed吧，在最新版本的es中，不需要指定not_analyzed也可以，将type=keyword即可。

GET /forum/article/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "articleID.keyword": "XHDK-A-1293-#fJ3"
        }
      }
    }
  }
}

返回结果：

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "forum",
        "_type": "article",
        "_id": "1",
        "_score": 1,
        "_source": {
          "articleID": "XHDK-A-1293-#fJ3",
          "userID": 1,
          "hidden": false,
          "postDate": "2017-01-01"
        }
      }
    ]
  }
}

（7）查看分词

GET /forum/_analyze
{
  "field": "articleID",
  "text": "XHDK-A-1293-#fJ3"
}

返回结果

{
  "tokens": [
    {
      "token": "xhdk",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "a",
      "start_offset": 5,
      "end_offset": 6,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "1293",
      "start_offset": 7,
      "end_offset": 11,
      "type": "<NUM>",
      "position": 2
    },
    {
      "token": "fj3",
      "start_offset": 13,
      "end_offset": 16,
      "type": "<ALPHANUM>",
      "position": 3
    }
  ]
}

默认是analyzed的text类型的field，建立倒排索引的时候，就会对所有的articleID分词，分词以后，原本的articleID就没了，只有分词后的各个word存在于倒排索引中。
term，是不对搜索文本分词的，XHDK-A-1293-#fJ3 --> XHDK-A-1293-#fJ3；但是articleID建立索引的时候XHDK-A-1293-#fJ3 --> xhdk，a，1293，fj3，所以搜索不到。

3、梳理学习到的知识点
（1）term filter：根据exact value进行搜索，数字、boolean、date天然支持
（2）text需要建索引时指定为not_analyzed或者类型的指定为keyword，才能用term query
（3）相当于SQL中的单个where条件

若有兴趣，欢迎来加入群，【Java初学者学习交流群】：458430385，此群有Java开发人员、UI设计人员和前端工程师。有问必答，共同探讨学习，一起进步！
欢迎关注我的微信公众号【Java码农社区】，会定时推送各种干货：

qrcode_for_gh_577b64e73701_258.jpg

二、Elasticsearch的term filter来搜索数据

猜你喜欢

热点阅读