Elasticsearch 7.x 小白到高手

elasticsearch 7.0 新特性之 Rank Feat

2019-04-15  本文已影响71人  郭彦超

Rank Feature 和 Rank Features 字段类型的支持,使得ES在特征数据处理上成为了可能

1、介绍

rank_feature 和 rank_features 只支持存储数字,查询时使用 rank_feature query语句;rank_features 是rank_feature的扩展,支持存储多个维度,当特征维度比较多时,使用rank_features是非常适合的。

PUT test
{
  "mappings": {
    "properties": {
      "pagerank": {
        "type": "rank_feature"
      },
      "url_length": {
        "type": "rank_feature",
        "positive_score_impact": false
      },
      "topics": {
        "type": "rank_features"
      }
    }
  }
}

PUT test/_doc/1
{
  "url": "http://en.wikipedia.org/wiki/2016_Summer_Olympics",
  "content": "Rio 2016",
  "pagerank": 50.3,
  "url_length": 42,
  "topics": {
    "sports": 50,
    "brazil": 30
  }
}

PUT test/_doc/2
{
  "url": "http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
  "content": "Formula One motor race held on 13 November 2016 at the Autódromo José Carlos Pace in São Paulo, Brazil",
  "pagerank": 50.3,
  "url_length": 47,
  "topics": {
    "sports": 35,
    "formula one": 65,
    "brazil": 20
  }
}

PUT test/_doc/3
{
  "url": "http://en.wikipedia.org/wiki/Deadpool_(film)",
  "content": "Deadpool is a 2016 American superhero film",
  "pagerank": 50.3,
  "url_length": 37,
  "topics": {
    "movies": 60,
    "super hero": 65
  }
}

POST test/_refresh

GET test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "content": "2016"
          }
        }
      ],
      "should": [
        {
          "rank_feature": {
            "field": "pagerank"
          }
        },
        {
          "rank_feature": {
            "field": "url_length",
            "boost": 0.1
          }
        },
        {
          "rank_feature": {
            "field": "topics.sports",
            "boost": 0.4
          }
        }
      ]
    }
  }
}

2、操作

rank feature和rank features 只能搭配rank_feature query语句使用,不支持其它query以及排序和聚合操作,它们存储的特征数值只能是正数。

如果某个特征对于整体打分成负相关的话,需要将该field对应的positive_score_impact 参数设置为false(默认是true),这样在进行rank_feature query查询时,该字段对应的value值会对整体打分进行衰减,如在网站搜索引擎中 url 长度字段,url越长的对文档提升score贡献越低。

上一篇下一篇

猜你喜欢

热点阅读