Elasticsearch 7.x 深入【7】文档打分

2020-05-13  本文已影响0人  孙瑞锴

1. 借鉴

极客时间 阮一鸣老师的Elasticsearch核心技术与实战
ElasticSearch7+Spark构建高相关性搜索服务&千人千面推荐系统
Lucene学习总结之六:Lucene打分公式的数学推导
【Elasticsearch】打分策略详解与explain手把手计算
官网 Theory Behind Relevance Scoring
对数计算器

2. 开始

数据准备:Elasticsearch 7.x 深入 数据准备

自定义打分

使用function score自定义打分

在查询结束后,对匹配的每个文档进行重新算分并排序

GET /notes/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "title": "elasticsearch hadoop canal"
        }
      },
      "functions": [
        {
          "field_value_factor": {
            "field": "grade"
          }
        }
      ]
    }
  }
}
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 98.08292,
    "hits" : [
      {
        "_index" : "notes",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 98.08292,
        "_source" : {
          "title" : "Elasticsearch",
          "content" : "About Elasticsearch",
          "grade" : 100
        }
      },
      {
        "_index" : "notes",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 3.9233167,
        "_source" : {
          "title" : "Hadoop",
          "content" : "About Hadoop",
          "grade" : 4.5
        }
      },
      {
        "_index" : "notes",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 3.9233167,
        "_source" : {
          "title" : "Canal",
          "content" : "About Canal",
          "grade" : 4.7
        }
      }
    ]
  }
}

可以看到如果grade差距非常大时,分数差距特别大,有哪些参数可以调整它的离散程度呢?

变更分数1
释义
none 不使用
log 以10为底,新分值=老分值*log(filed的值)
log1p 以10为底,新分值=老分值*log(1 + filed的值)
log2p 以10为底,新分值=老分值*log(2 + filed的值)
ln 以e为底,新分值=老分值*ln( filed的值)
ln1p 以e为底,新分值=老分值*ln(1 + filed的值)
ln2p 以e为底,新分值=老分值*ln(2 + filed的值)
square 平方
sqrt 开方
reciprocal 倒数
GET /notes/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "title": "elasticsearch hadoop canal"
        }
      },
      "functions": [
        {
          "field_value_factor": {
            "field": "grade",
            "modifier": "log1p"
          }
        }
      ]
    }
  }
}
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.9658968,
    "hits" : [
      {
        "_index" : "notes",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.9658968,
        "_source" : {
          "title" : "Elasticsearch",
          "content" : "About Elasticsearch",
          "grade" : 100
        }
      },
      {
        "_index" : "notes",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.6855702,
        "_source" : {
          "title" : "Hadoop",
          "content" : "About Hadoop",
          "grade" : 4.5
        }
      },
      {
        "_index" : "notes",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.6855702,
        "_source" : {
          "title" : "Canal",
          "content" : "About Canal",
          "grade" : 4.7
        }
      }
    ]
  }
}
- - 补充说明
老分数 0.9808292 通过以下查询得来的分数:
GET /notes/_search
{
  "query": {
    "match": {
      "title": "elasticsearch hadoop canal"
      }
    }
}
计算公式为 新分值=老分值*log(1 + filed的值) 我们使用的modifier为log1p
我们计算的新分数的值: 0.9808292 * log(1 + 100) = 0.9808292 * 2.0043 = 1.96587597 -
es通过function_score查询出来的值: 1.9658968 -
变更分数2
GET /notes/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "title": "elasticsearch hadoop canal"
        }
      },
      "functions": [
        {
          "field_value_factor": {
            "field": "grade",
            "modifier": "log1p",
            "factor": 2
          }
        }
      ]
    }
  }
}
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 2.2590418,
    "hits" : [
      {
        "_index" : "notes",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 2.2590418,
        "_source" : {
          "title" : "Elasticsearch",
          "content" : "About Elasticsearch",
          "grade" : 100
        }
      },
      {
        "_index" : "notes",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.9359489,
        "_source" : {
          "title" : "Hadoop",
          "content" : "About Hadoop",
          "grade" : 4.5
        }
      },
      {
        "_index" : "notes",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.9359489,
        "_source" : {
          "title" : "Canal",
          "content" : "About Canal",
          "grade" : 4.7
        }
      }
    ]
  }
}
变更分数3
释义 是否默认
Multiply match匹配的相关的分数*grade的数值
Sum match匹配的相关的分数+grade的数值
Max/Min 最大/最小;【取(match匹配的相关的分数)或(grade的数值)中的最大/最小值】
Replace 使用函数值【grade的数值】替换算分
GET /notes/_search
{
  "_source": "", 
  "query": {
    "function_score": {
      "query": {
        "match": {
          "title": "elasticsearch hadoop canal"
        }
      },
      "functions": [
        {
          "field_value_factor": {
            "field": "grade",
            "modifier": "log2p",
            "factor": 10
          }
        },
        {
          "field_value_factor": {
            "field": "grade",
            "modifier": "log2p",
            "factor": 5
          }
        }
      ],
      "score_mode": "sum"
    }
  }
}
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 5.5922675,
    "hits" : [
      {
        "_index" : "notes",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 5.5922675,
        "_source" : { }
      },
      {
        "_index" : "notes",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 2.9088175,
        "_source" : { }
      },
      {
        "_index" : "notes",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 2.9088175,
        "_source" : { }
      }
    ]
  }
}
- - 补充说明
老分数 0.9808292 通过以下查询得来的分数:
GET /notes/_search
{
  "query": {
    "match": {
      "title": "elasticsearch hadoop canal"
      }
    }
}
计算公式为 新分值=老分值 * (log(2 + 5* filed的值) + log(2 + 10* filed的值)) 我们使用的modifier为log2p
我们计算的新分数的值: 0.9808292 * (log(2 + 5 *100) + log(2 + 10 *100)) = 0.9808292 * (2.7007 + 3.0009) = 5.59229577 -
es通过function_score查询出来的值: 5.5922675 -
变更分数4
释义 是否默认
Multiply match匹配的相关的分数*grade的数值
Sum match匹配的相关的分数+grade的数值
Max/Min 最大/最小;【取(match匹配的相关的分数)或(grade的数值)中的最大/最小值】
Replace 使用函数值【grade的数值】替换算分
GET /notes/_search
{
  "_source": "", 
  "query": {
    "function_score": {
      "query": {
        "match": {
          "title": "elasticsearch"
        }
      },
      "functions": [
        {
          "field_value_factor": {
            "field": "grade"
          }
        }
      ],
      "boost_mode": "sum"
    }
  }
}
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 100.98083,
    "hits" : [
      {
        "_index" : "notes",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 100.98083,
        "_source" : { }
      }
    ]
  }
}

max_boost属性

GET /notes/_search
{
  "_source": "", 
  "query": {
    "function_score": {
      "query": {
        "match": {
          "title": "elasticsearch hadoop canal"
        }
      },
      "functions": [
        {
          "field_value_factor": {
            "field": "grade"
          }
        }
      ],
      "max_boost": 10
    }
  }
}
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 9.808291,
    "hits" : [
      {
        "_index" : "notes",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 9.808291,
        "_source" : { }
      },
      {
        "_index" : "notes",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 3.9233167,
        "_source" : { }
      },
      {
        "_index" : "notes",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 3.9233167,
        "_source" : { }
      }
    ]
  }
}

random_score属性

GET /notes/_search
{
  "_source": "", 
  "query": {
    "function_score": {
      "query": {
        "match": {
          "title": "elasticsearch hadoop canal"
        }
      },
      "functions": [
        {
          "random_score": {
            "seed": 1231
          }
        }
      ]
    }
  }
}
函数 释义
weight 设置权重
field value factor 使用该值来修改score
random 随机算分
衰减函数 以某个字段的值为标准,距离某个值越接近,得分越高
script 自定义脚本实现算分

查看打分详情

1. 在查询中添加_explain参数

GET /tmdb_movies/_search
{
  "explain": true, 
  "query": {
    "multi_match": {
      "query": "steve",
      "fields": ["title"]
    }
  }
}

2. 使用_validate/query?explain

GET /tmdb_movies/_validate/query?explain
{
  "query": {
    "multi_match": {
      "query": "steve",
      "fields": ["title"]
    }
  }
}
{
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "valid" : true,
  "explanations" : [
    {
      "index" : "tmdb_movies",
      "valid" : true,
      "explanation" : "title:steve"
    }
  ]
}

3. 大功告成

上一篇下一篇

猜你喜欢

热点阅读