4.3-搜索的相关性算分

2020-03-30  本文已影响0人  落日彼岸

相关性和相关性算分

词(Term) 文档(Doc Id)
区块链 1,2,3
2,3,4,5,6,7,8,10,12,13,15,18,19,20
应用 2,3,8,9,10,13,15

词频 TF

逆⽂档频率 IDF

出现的文档数 总文档数 IDF
区块链 200万 10亿 log(500) = 8.96
10亿 10亿 log(500) = 0
应用 5亿 10亿 log(500) = 1

TF-IDF 的概念

Lucene 中的 TF-IDF 评分公式

TF-IDF 评分公式

BM 25

BM 25

定制 Similarity

定制Similarity

通过 Explain API 查看 TF-IDF

PUT testscore/_bulk
{ "index": { "_id": 1 }}
{ "content":"we use Elasticsearch to power the search" }
{ "index": { "_id": 2 }}
{ "content":"we like elasticsearch" }
{ "index": { "_id": 3 }}
{ "content":"The scoring of documents is caculated by the scoring formula" }
{ "index": { "_id": 4 }}
{ "content":"you know, for search" }



POST /testscore/_search
{
  //"explain": true,
  "query": {
    "match": {
      "content":"you"
      //"content": "elasticsearch"
      //"content":"the"
      //"content": "the elasticsearch"
    }
  }
}

Boosting Relevance

POST testscore/_search
{
    "query": {
        "boosting" : {
            "positive" : {
                "term" : {
                    "content" : "elasticsearch"
                }
            },
            "negative" : {
                 "term" : {
                     "content" : "like"
                }
            },
            "negative_boost" : 0.2
        }
    }
}

本节知识点回顾

课程demo

PUT testscore
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text"
      }
    }
  }
}


PUT testscore/_bulk
{ "index": { "_id": 1 }}
{ "content":"we use Elasticsearch to power the search" }
{ "index": { "_id": 2 }}
{ "content":"we like elasticsearch" }
{ "index": { "_id": 3 }}
{ "content":"The scoring of documents is caculated by the scoring formula" }
{ "index": { "_id": 4 }}
{ "content":"you know, for search" }



POST /testscore/_search
{
  "explain": true,
  "query": {
    "match": {
      "content":"you"
      //"content": "elasticsearch"
      //"content":"the"
      //"content": "the elasticsearch"
    }
  }
}

POST testscore/_search
{
    "query": {
        "boosting" : {
            "positive" : {
                "term" : {
                    "content" : "elasticsearch"
                }
            },
            "negative" : {
                 "term" : {
                     "content" : "like"
                }
            },
            "negative_boost" : 0.2
        }
    }
}
上一篇 下一篇

猜你喜欢

热点阅读