es（2）—复杂的多条件查询(bool查询与constant_s

2021-04-20 本文已影响0人小胖学编程

项目中的es查询需求需要查询多种多样的文本，并且根据一系列的条件进行过滤。而es如何去构建复杂的多条件查询呢？

1. 使用bool查询

接收如下参数：

must：必须匹配，贡献算分；
must_not：必须不匹配，不贡献算分；
should：如果满足这些语句中任意语句，将增加_score，否则无任何影响，它们主要用于修正每个文档的相关性得分；
filter：必须匹配，但它不评分、过滤模式来进行。这些语句对评分没有贡献，只是根据过滤标准来排除或包含文档。

在上述参数中，依旧可以使用bool关键字进行查询。

bool {
   filter: [
     a=1,
     bool {
        should: [
           b=2,
           c=3
        ]
     }
   ]
}

相关性得分的计算规则：
每一个子查询都独自计算doc的相关性得分，一旦他们的得分被计算出来，bool查询就将这些得分进行合并并且返回一个代表整个bool操作的得分。

需要注意点。should单独使用的使用具有or的作用，但是和must或者filter同级使用时只会修改每个文档的相关性得分。

查看测试用例：

image.png

由上图可知，filter仅仅是筛选条件，并不会计算_score，而should仅仅是计算_score，并不会进行筛选。

1.1 Filter和Query的区别

filter和must_not属于Filter Context，不会对_score结果产生影响；
must和should属于Query Context，会对_score结果产生影响；

Filter不需要计算相关性算分，不需要按照相关分数进行排序，同时还自动缓存最常用filter的数据。性能好。
Query操作恰恰相反，会计算相关性算分，并且按照结果进行排序，无法缓存结果，性能不好。
故，在某些不需要相关性算分的查询场景，尽量使用FilterContext优化查询性能。

2. 使用constant_score查询

constant_score：本意（常量分数），可以将一个不变的常量应用到所有匹配的文档中。它常用于只需要执行一个filter而没有其他查询（例如评分查询）的情况下。term查询被放置在constant_score中，转换成不评分的filter。这种方式可以用来只有filter的bool查询中。

数据准备：

# 创建索引
PUT test_terms
#创建映射
PUT test_terms/_mapping
{
  "properties":{
    "aid":{
      "type":"keyword",
      "index":true
    },
    "name":{
      "type":"text",
      "index":true
    },"tag":{
      "type":"keyword",
      "index":true
    }
  }
}
#查看映射
GET test_terms/_mapping

# 批量生成数据
POST _bulk
{"create":{"_index":"test_terms","_type":"_doc"}}
{"aid":"1001","name":"JAVA book","tag":["JAVA","book"]}
{"create":{"_index":"test_terms","_type":"_doc"}}
{"aid":"1002","name":"PHP dic page","tag":["PHP","page","learning"]}
{"create":{"_index":"test_terms","_type":"_doc"}}
{"aid":"1003","name":"JAVA hadoop","tag":["hadoop","JAVA"]}
{"create":{"_index":"test_terms","_type":"_doc"}}
{"aid":"1004","name":"hadoop","tag":["hadoop"]}

# 单条插入数据
POST test_terms/_doc
{
  "aid":"1005",
  "name":"JAVA ES hadoop",
  "tag":["JAVA","JAVA","ES"]
}

使用constant_score查询：

GET test_terms/_search
{
  "query": {
    "bool": {
      "should": [
        {"constant_score": {
          "filter": {"term": {
            "tag": "JAVA"
          }},
          "boost": 1
        }},
        {"constant_score": {
          "filter": {"term": {
            "tag": "hadoop"
          }},
          "boost": 1
        }}
      ]
    }
  }
}

其本质相当于：

GET test_terms/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "filter": {
              "term": {
                "tag": "JAVA"
              }
            }
          }
        }
      ]
    }
  }
}

只是bool中只有filter操作，故使用constant_score代替了filter操作，且为每一个子查询设置了常量分数。

image.png

由上述结果中也可以看出，_score的计算规则：每一个子查询都独自计算doc的相关性得分，一旦他们的得分被计算出来，bool查询就将这些得分进行合并并且返回一个代表整个bool操作的得分。

不使用constant_score

GET test_terms/_search
{
  "query": {
    "bool": {
      "should": [
        {"term": {
          "tag": {
            "value": "JAVA",
            "boost": 2
          }
        }},
        {
        "term": {
          "tag": {
            "value": "hadoop",
            "boost": 1
          }
        }  
        }
      ]
    }
  }
}

image.png

与使用constant_score的区别是，should参与评分计算，而我们设置的boost只是参与评分的权重比例。并不是一个具体的分数值。这也就是为何这种方式计算出来的score存在小数的原因。

es（2）—复杂的多条件查询(bool查询与constant_s

1. 使用bool查询

1.1 Filter和Query的区别

2. 使用constant_score查询

猜你喜欢

热点阅读