ElastichSearchelasticsearch玩转大数据

七、Elasticsearch手动控制全文检索结果的精准度

2017-07-15  本文已影响131人  编程界的小学生

1、数据准备

POST /forum/article/_bulk
{ "update": { "_id": "1"} }
{ "doc" : {"title" : "this is java and elasticsearch blog"} }
{ "update": { "_id": "2"} }
{ "doc" : {"title" : "this is java blog"} }
{ "update": { "_id": "3"} }
{ "doc" : {"title" : "this is elasticsearch blog"} }
{ "update": { "_id": "4"} }
{ "doc" : {"title" : "this is java, elasticsearch, hadoop blog"} }
{ "update": { "_id": "5"} }
{ "doc" : {"title" : "this is spark blog"} }

2、搜索标题中包含java或包含Elasticsearch的document

SQL:
select * from tab where title like 'java' or title like 'Elasticsearch'

ES:

GET /forum/article/_search
{
  "query": {
    "match": {
      "title": "java Elasticsearch"
    }
  }
}

这个就跟以前的那个term query不一样了,不是搜索exact value,是进行full text全文检索,match query,是负责进行全文检索的,当然,如果要检索的field是not_analyzed类型的,那么match query也相当于term query

3、搜索标题中包含java和Elasticsearch的document

要求title中既包含java也包含Elasticsearch

GET /forum/article/_search
{
  "query": {
    "match": {
      "title": {
        "query": "java elasticsearch",
        "operator": "and"
      }
    }
  }
}

operator:支持and和or,and是并且,or是或者。用or的话和需求2的结果是一样的。

4、搜索包含java,Elasticsearch,spark,hadoop,4个关键字中,至少包含三个的document

GET /forum/article/_search
{
  "query": {
    "match": {
      "title": {
        "query": "java elasticsearch spark hadoop",
        "minimum_should_match" : "75%"
      }
    }
  }
}

minimun_should_match:75%,意思是说四个关键字中匹配75%,也就是4分之3。也就是说至少匹配三个关键字。

返回结果

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.3375794,
    "hits": [
      {
        "_index": "forum",
        "_type": "article",
        "_id": "4",
        "_score": 1.3375794,
        "_source": {
          "articleID": "QQPX-R-3956-#aD8",
          "userID": 2,
          "hidden": true,
          "postDate": "2017-01-02",
          "tag": [
            "java",
            "elasticsearch"
          ],
          "tag_cnt": 2,
          "view_cnt": 80,
          "title": "this is java, elasticsearch, hadoop blog"
        }
      }
    ]
  }
}

只有一条document匹配

5、搜索必须包含java,必须不包含spark,包含不包含Hadoop和Elasticsearch都行的document

GET /forum/article/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "java"
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "title": "spark"
          }
        }
      ],
      "should": [
        {
          "match": {
            "title": "hadoop"
          }
        },
        {
          "match": {
            "title": "elasticsearch"
          }
        }
      ]
    }
  }
}

6、用bool来搜索java,Hadoop,spark,Elasticsearch,至少包含其中的三个关键字

默认情况下,should是可以不匹配任何一个的,但是有个例外,就是如果没有must的话,那么should中必须至少匹配一个才可以。

但是可以精准控制should的几个条件中,至少匹配几个才能作为结果返回。

GET /forum/article/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title": "java" }},
        { "match": { "title": "elasticsearch"   }},
        { "match": { "title": "hadoop"   }},
          { "match": { "title": "spark"   }}
      ],
      "minimum_should_match": 3
    }
  }
}

7、梳理下学习到的知识点
1、全文检索的时候,进行多个值的检索,有两种做法,match query和should

2、控制搜素结果精准度:operator:【and or】,minimum_should_match

若有兴趣,欢迎来加入群,【Java初学者学习交流群】:458430385,此群有Java开发人员、UI设计人员和前端工程师。有问必答,共同探讨学习,一起进步!
欢迎关注我的微信公众号【Java码农社区】,会定时推送各种干货:


qrcode_for_gh_577b64e73701_258.jpg
上一篇下一篇

猜你喜欢

热点阅读