Elasticsearch使用指南

Elasticsearch Search API之(Reques

2019-08-26  本文已影响0人  中间件兴趣圈

​preference

查询选择副本分片的倾向性(即在一个复制组中选择副本的分片值。默认情况下,es以未指定的顺序从可用的碎片副本中进行选择,副本之间的路由将在集群章节更加详细的介绍 。可以通过该字段指定分片倾向与选择哪个副本。preference可选值:

explain

是否解释各分数是如何计算的。

GET /_search
{
    "explain": true,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

version

如果设置为true,则返回每个命中文档的当前版本号。

GET /_search
{
    "version": true,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

Index Boost

当搜索多个索引时,允许为每个索引配置不同的boost级别。当来自一个索引的点击率比来自另一个索引的点击率更重要时,该属性则非常方便。
使用示例如下:

GET /_search
{
    "indices_boost" : [
        { "alias1" : 1.4 },
        { "index*" : 1.3 }
    ]
}

min_score

指定返回文档的最小评分,如果文档的评分低于该值,则不返回。

GET /_search
{
    "min_score": 0.5,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

Named Queries

每个过滤器和查询都可以在其顶级定义中接受_name。搜索响应中每个匹配文档中会增加matched_queries结构体,记录该文档匹配的查询名称。查询和筛选器的标记只对bool查询有意义。
java示例如下:

public static void testNamesQuery() {
        RestHighLevelClient client = EsClient.getClient();
        try {
            SearchRequest searchRequest = new SearchRequest();
            searchRequest.indices("esdemo");
            SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
            sourceBuilder.query(
                    QueryBuilders.boolQuery()
                        .should(QueryBuilders.termQuery("context", "fox").queryName("q1"))
                        .should(QueryBuilders.termQuery("context", "brown").queryName("q2"))
                    );
            searchRequest.source(sourceBuilder);
            SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
            System.out.println(result);
        } catch (Throwable e) {
            e.printStackTrace();
        } finally {
            EsClient.close(client);
        }
    }

返回结果如下:

{
    "took":4,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":2,
        "max_score":0.5753642,
        "hits":[
            {
                "_index":"esdemo",
                "_type":"matchquerydemo",
                "_id":"2",
                "_score":0.5753642,
                "_source":{
                    "context":"My quick brown as fox eats rabbits on a regular basis.",
                    "title":"Keeping pets healthy"
                },
                "matched_queries":[
                    "q1",
                    "q2"
                ]
            },
            {
                "_index":"esdemo",
                "_type":"matchquerydemo",
                "_id":"1",
                "_score":0.39556286,
                "_source":{
                    "context":"Brown rabbits are commonly seen brown.",
                    "title":"Quick brown rabbits"
                },
                "matched_queries":[
                    "q2"
                ]
            }
        ]
    }
}

正如上面所说,每个匹配文档中都包含matched_queries,表明该文档匹配的是哪个查询条件。

Inner hits

用于定义内部嵌套层的返回规则,其inner hits支持如下选项:

field collapsing(字段折叠)

允许根据字段值折叠搜索结果。折叠是通过在每个折叠键上只选择排序最高的文档来完成的。有点类似于聚合分组,其效果类似于按字段进行分组,默认命中的文档列表第一层由该字段的第一条信息,也可以通过允许根据字段值折叠搜索结果。折叠是通过在每个折叠键上只选择排序最高的文档来完成的。例如下面的查询为每个用户检索最佳twee-t,并按喜欢的数量对它们进行排序。
下面首先通过示例进行展示field colla-psing的使用。
1)首先查询发的推特内容中包含elast-icsearch的推文:

GET /twitter/_search
{
    "query": {
        "match": {
            "message": "elasticsearch"
        }
    },
    "collapse" : {
        "field" : "user" 
    },
    "sort": ["likes"]
}

返回结果:

{
    "took":8,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":5,
        "max_score":null,
        "hits":[
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"OYnecmcB-IBeb8B-bF2X",
                "_score":null,
                "_source":{
                    "message":"to be a elasticsearch",
                    "user":"user2",
                    "likes":3
                },
                "sort":[
                    3
                ]
            },
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"OonecmcB-IBeb8B-bF2q",
                "_score":null,
                "_source":{
                    "message":"to be elasticsearch",
                    "user":"user2",
                    "likes":3
                },
                "sort":[
                    3
                ]
            },
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"OInecmcB-IBeb8B-bF2G",
                "_score":null,
                "_source":{
                    "message":"elasticsearch is very high",
                    "user":"user1",
                    "likes":3
                },
                "sort":[
                    3
                ]
            },
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"O4njcmcB-IBeb8B-Rl2H",
                "_score":null,
                "_source":{
                    "message":"elasticsearch is high db",
                    "user":"user1",
                    "likes":1
                },
                "sort":[
                    1
                ]
            },
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"N4necmcB-IBeb8B-bF0n",
                "_score":null,
                "_source":{
                    "message":"very likes elasticsearch",
                    "user":"user1",
                    "likes":1
                },
                "sort":[
                    1
                ]
            }
        ]
    }
}

首先上述会列出所有用户的推特,如果只想每个用户只显示一条推文,并且点赞率最高,或者每个用户只显示2条推文呢?这个时候,按字段折叠就闪亮登场了。java demo如下:

public static void search_field_collapsing() {
        RestHighLevelClient client = EsClient.getClient();
        try {
            SearchRequest searchRequest = new SearchRequest();
            searchRequest.indices("mapping_field_collapsing_twitter");
            SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
            sourceBuilder.query(
                    QueryBuilders.matchQuery("message","elasticsearch")
            );
            sourceBuilder.sort("likes", SortOrder.DESC);
            CollapseBuilder collapseBuilder = new CollapseBuilder("user");
            sourceBuilder.collapse(collapseBuilder);
            searchRequest.source(sourceBuilder);
            SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
            System.out.println(result);
        } catch (Throwable e) {
            e.printStackTrace();
        } finally {
            EsClient.close(client);
        }
    }

其结果如下:

{
    "took":22,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":5,
        "max_score":null,
        "hits":[
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"OYnecmcB-IBeb8B-bF2X",
                "_score":null,
                "_source":{
                    "message":"to be a elasticsearch",
                    "user":"user2",
                    "likes":3
                },
                "fields":{
                    "user":[
                        "user2"
                    ]
                },
                "sort":[
                    3
                ]
            },
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"OInecmcB-IBeb8B-bF2G",
                "_score":null,
                "_source":{
                    "message":"elasticsearch is very high",
                    "user":"user1",
                    "likes":3
                },
                "fields":{
                    "user":[
                        "user1"
                    ]
                },
                "sort":[
                    3
                ]
            }
        ]
    }
}

上面的示例只返回了每个用户的第一条数据,如果需要每个用户返回2条数据呢?可以通过inner_hit来设置。

public static void search_field_collapsing() {
        RestHighLevelClient client = EsClient.getClient();
        try {
            SearchRequest searchRequest = new SearchRequest();
            searchRequest.indices("mapping_field_collapsing_twitter");
            SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
            sourceBuilder.query(
                    QueryBuilders.matchQuery("message","elasticsearch")
            );
            sourceBuilder.sort("likes", SortOrder.DESC);
            CollapseBuilder collapseBuilder = new CollapseBuilder("user");
            
            InnerHitBuilder collapseHitBuilder = new InnerHitBuilder("collapse_inner_hit");
            collapseHitBuilder.setSize(2);
            collapseBuilder.setInnerHits(collapseHitBuilder);
            sourceBuilder.collapse(collapseBuilder);
            
            searchRequest.source(sourceBuilder);
            SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
            System.out.println(result);
        } catch (Throwable e) {
            e.printStackTrace();
        } finally {
            EsClient.close(client);
        }
    }

返回结果如下:

{
    "took":42,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":5,
        "max_score":null,
        "hits":[
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"OYnecmcB-IBeb8B-bF2X",
                "_score":null,
                "_source":{
                    "message":"to be a elasticsearch",
                    "user":"user2",
                    "likes":3
                },
                "fields":{
                    "user":[
                        "user2"
                    ]
                },
                "sort":[
                    3
                ],
                "inner_hits":{
                    "collapse_inner_hit":{
                        "hits":{
                            "total":2,
                            "max_score":0.19363807,
                            "hits":[
                                {
                                    "_index":"mapping_field_collapsing_twitter",
                                    "_type":"_doc",
                                    "_id":"OonecmcB-IBeb8B-bF2q",
                                    "_score":0.19363807,
                                    "_source":{
                                        "message":"to be elasticsearch",
                                        "user":"user2",
                                        "likes":3
                                    }
                                },
                                {
                                    "_index":"mapping_field_collapsing_twitter",
                                    "_type":"_doc",
                                    "_id":"OYnecmcB-IBeb8B-bF2X",
                                    "_score":0.17225473,
                                    "_source":{
                                        "message":"to be a elasticsearch",
                                        "user":"user2",
                                        "likes":3
                                    }
                                }
                            ]
                        }
                    }
                }
            },
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"OInecmcB-IBeb8B-bF2G",
                "_score":null,
                "_source":{
                    "message":"elasticsearch is very high",
                    "user":"user1",
                    "likes":3
                },
                "fields":{
                    "user":[
                        "user1"
                    ]
                },
                "sort":[
                    3
                ],
                "inner_hits":{
                    "collapse_inner_hit":{
                        "hits":{
                            "total":3,
                            "max_score":0.2876821,
                            "hits":[
                                {
                                    "_index":"mapping_field_collapsing_twitter",
                                    "_type":"_doc",
                                    "_id":"O4njcmcB-IBeb8B-Rl2H",
                                    "_score":0.2876821,
                                    "_source":{
                                        "message":"elasticsearch is high db",
                                        "user":"user1",
                                        "likes":1
                                    }
                                },
                                {
                                    "_index":"mapping_field_collapsing_twitter",
                                    "_type":"_doc",
                                    "_id":"N4necmcB-IBeb8B-bF0n",
                                    "_score":0.2876821,
                                    "_source":{
                                        "message":"very likes elasticsearch",
                                        "user":"user1",
                                        "likes":1
                                    }
                                }
                            ]
                        }
                    }
                }
            }
        ]
    }
}

此时,返回结果是两级,第一级,还是每个用户第一条消息,然后再内部中嵌套inner_hits。

Search After

Elasticsearch支持的第三种分页获取方式,该方法不支持跳转页面。
es支持的分页方式目前已知:

  1. 通过from和size,当时当达到深度分页时,成本变的非常高昂,故es提供了索引参数:index.max_result_window来控制(from + size)的最大值,默认为10000,超过该值后将报错。

  2. 通过scroll滚动API,该方式类似于快照的工作方式,不具备实时性,并且滚动上下文的存储需要耗费一定的性能。
    本节将介绍第3种分页方式,search after,基于上一页查询的结果进行下一页数据的查询。基本思想是选择一组排序字段,能做到全局唯一。es的排序查询响应结果中会返回sort数组,包含本排序字段的最大值,下一页查询将该组字段当成查询条件,es在此数据的基础下返回下一批合适的数据。
    java示例如下:

public static void search_search_after() {
        RestHighLevelClient client = EsClient.getClient();
        try {
            SearchRequest searchRequest = new SearchRequest();
            searchRequest.indices("mapping_search_after");
            SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
            sourceBuilder.query(
                    QueryBuilders.termQuery("user","user2")
            );
            sourceBuilder.size(1);
            sourceBuilder.sort("id", SortOrder.ASC);
            searchRequest.source(sourceBuilder);
            SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
            System.out.println(result);
            if(hasHit(result)) { // 如果本次匹配到数据
                // 省略处理数据逻辑
                // 继续下一批查询
                // result.getHits().
                int length = result.getHits().getHits().length;
                SearchHit aLastHit = result.getHits().getHits()[length - 1];
                //开始下一轮查询
                sourceBuilder.searchAfter(aLastHit.getSortValues());
                result = client.search(searchRequest, RequestOptions.DEFAULT);
                System.out.println(result);
            }
        } catch (Throwable e) {
            e.printStackTrace();
        } finally {
            EsClient.close(client);
        }
    }
    private static boolean hasHit(SearchResponse result) {
        return !( result.getHits() == null ||
                result.getHits().getHits() == null ||
                result.getHits().getHits().length < 1 );
    }

本文详细介绍 preference、explain、version、index boost、min_score、names query、Inner hits、field collapsing、Search After。

上一篇下一篇

猜你喜欢

热点阅读