关于搜索,我们聊聊Elasticsearch 7.x 小白到高手

elasticsearch 7.0 新特性之 Rank Feat

2019-04-16  本文已影响28人  郭彦超

Rank Feature为es能在机器学习场景应用提供支持,是es处理特征计算的开始

1、介绍

rank_feature 是es7.0引入的一种特殊的查询query ,这种查询只在rank_feature 和 rank_features字段类型上有效(rank_feature 与rank_features是es7.0新增的数据类型),通常被放到boolean query中的should子句中用来提升文档score,需要注意的是这种查询的性能要高于function score。

通过一个例子进行介绍:

PUT test
{
  "mappings": {
    "properties": {
      "pagerank": {
        "type": "rank_feature"
      },
      "url_length": {
        "type": "rank_feature",
        "positive_score_impact": false
      },
      "topics": {
        "type": "rank_features"
      }
    }
  }
}

PUT test/_doc/1
{
  "url": "http://en.wikipedia.org/wiki/2016_Summer_Olympics",
  "content": "Rio 2016",
  "pagerank": 50.3,
  "url_length": 42,
  "topics": {
    "sports": 50,
    "brazil": 30
  }
}

PUT test/_doc/2
{
  "url": "http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
  "content": "Formula One motor race held on 13 November 2016 at the Autódromo José Carlos Pace in São Paulo, Brazil",
  "pagerank": 50.3,
  "url_length": 47,
  "topics": {
    "sports": 35,
    "formula one": 65,
    "brazil": 20
  }
}

PUT test/_doc/3
{
  "url": "http://en.wikipedia.org/wiki/Deadpool_(film)",
  "content": "Deadpool is a 2016 American superhero film",
  "pagerank": 50.3,
  "url_length": 37,
  "topics": {
    "movies": 60,
    "super hero": 65
  }
}

POST test/_refresh

GET test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "content": "2016"
          }
        }
      ],
      "should": [
        {
          "rank_feature": {
            "field": "pagerank"
          }
        },
        {
          "rank_feature": {
            "field": "url_length",
            "boost": 0.1
          }
        },
        {
          "rank_feature": {
            "field": "topics.sports",
            "boost": 0.4
          }
        }
      ]
    }
  }
}

2、操作

rank_feature query 支持3中影响打分的函数,分别是saturation(默认)、Logarithm、Sigmoid。

GET test/_search
{
  "query": {
    "rank_feature": {
      "field": "pagerank",
      "saturation": {
        "pivot": 8
      }
    }
  }
}

如果不指定pivot,elasticsearch会计算该field下索引值,近似求解出一个平均值作为pivot值;如果不知道如何设置pivot,官方建议不设置。

GET test/_search
{
  "query": {
    "rank_feature": {
      "field": "pagerank",
      "saturation": {}
    }
  }
}
GET test/_search
{
  "query": {
    "rank_feature": {
      "field": "pagerank",
      "log": {
        "scaling_factor": 4
      }
    }
  }
}

需要注意的是该函数下的rank feature 或 rank features的value值必须是正数。

GET test/_search
{
  "query": {
    "rank_feature": {
      "field": "pagerank",
      "sigmoid": {
        "pivot": 7,
        "exponent": 0.6
      }
    }
  }
}
上一篇 下一篇

猜你喜欢

热点阅读