Elasticsearch Search API之(Reques
preference
查询选择副本分片的倾向性(即在一个复制组中选择副本的分片值。默认情况下,es以未指定的顺序从可用的碎片副本中进行选择,副本之间的路由将在集群章节更加详细的介绍 。可以通过该字段指定分片倾向与选择哪个副本。preference可选值:
-
_primary
只在节点上执行,在6.1.0版本后废弃,将在7.x版本移除。 -
_primary_first
优先在主节点上执行。在6.1.0版本后废弃,将在7.x版本移除。 -
_replica
操作只在副本分片上执行,如果有多个副本,其顺序随机。在6.1.0版本后废弃,将在7.x版本移除。 -
_replica_first
优先在副本分片上执行,如果有多个副本,其顺序随机。在6.1.0版本后废弃,将在7.x版本移除。 -
_only_local
操作将只在分配给本地节点的分片上执行。_only_local选项保证只在本地节点上使用碎片副本,这对于故障排除有时很有用。所有其他选项不能完全保证在搜索中使用任何特定的碎片副本,而且在索引更改时,这可能意味着如果在处于不同刷新状态的不同碎片副本上执行重复搜索,则可能产生不同的结果。 -
_local
优先在本地分片上执行。 -
_prefer_nodes:abc,xyz
优先在指定节点ID的分片上执行,示例中的节点ID为abc、xyz。 -
shards:2,3
将操作限制到指定的分片上执行。(这里是2和3)这个首选项可以与其他首选项组合,但必须首先出现-shards:2,3|_local。 -
_only_nodes:abc,xyz,…
根据节点ID进行限制。 -
Custom (string) value
自定义字符串,其路由为 hashcod-e(该值)%赋值组内节点数。例如在web应用中通常以sessionId为倾向值。
explain
是否解释各分数是如何计算的。
GET /_search
{
"explain": true,
"query" : {
"term" : { "user" : "kimchy" }
}
}
version
如果设置为true,则返回每个命中文档的当前版本号。
GET /_search
{
"version": true,
"query" : {
"term" : { "user" : "kimchy" }
}
}
Index Boost
当搜索多个索引时,允许为每个索引配置不同的boost级别。当来自一个索引的点击率比来自另一个索引的点击率更重要时,该属性则非常方便。
使用示例如下:
GET /_search
{
"indices_boost" : [
{ "alias1" : 1.4 },
{ "index*" : 1.3 }
]
}
min_score
指定返回文档的最小评分,如果文档的评分低于该值,则不返回。
GET /_search
{
"min_score": 0.5,
"query" : {
"term" : { "user" : "kimchy" }
}
}
Named Queries
每个过滤器和查询都可以在其顶级定义中接受_name。搜索响应中每个匹配文档中会增加matched_queries结构体,记录该文档匹配的查询名称。查询和筛选器的标记只对bool查询有意义。
java示例如下:
public static void testNamesQuery() {
RestHighLevelClient client = EsClient.getClient();
try {
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("esdemo");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(
QueryBuilders.boolQuery()
.should(QueryBuilders.termQuery("context", "fox").queryName("q1"))
.should(QueryBuilders.termQuery("context", "brown").queryName("q2"))
);
searchRequest.source(sourceBuilder);
SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(result);
} catch (Throwable e) {
e.printStackTrace();
} finally {
EsClient.close(client);
}
}
返回结果如下:
{
"took":4,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"skipped":0,
"failed":0
},
"hits":{
"total":2,
"max_score":0.5753642,
"hits":[
{
"_index":"esdemo",
"_type":"matchquerydemo",
"_id":"2",
"_score":0.5753642,
"_source":{
"context":"My quick brown as fox eats rabbits on a regular basis.",
"title":"Keeping pets healthy"
},
"matched_queries":[
"q1",
"q2"
]
},
{
"_index":"esdemo",
"_type":"matchquerydemo",
"_id":"1",
"_score":0.39556286,
"_source":{
"context":"Brown rabbits are commonly seen brown.",
"title":"Quick brown rabbits"
},
"matched_queries":[
"q2"
]
}
]
}
}
正如上面所说,每个匹配文档中都包含matched_queries,表明该文档匹配的是哪个查询条件。
Inner hits
用于定义内部嵌套层的返回规则,其inner hits支持如下选项:
-
from 用于内部匹配的分页。
-
size 用于内部匹配的分页,size。
-
sort 排序策略。
-
name 为内部嵌套层定义的名称。
该部分示例将在下节重点阐述。
field collapsing(字段折叠)
允许根据字段值折叠搜索结果。折叠是通过在每个折叠键上只选择排序最高的文档来完成的。有点类似于聚合分组,其效果类似于按字段进行分组,默认命中的文档列表第一层由该字段的第一条信息,也可以通过允许根据字段值折叠搜索结果。折叠是通过在每个折叠键上只选择排序最高的文档来完成的。例如下面的查询为每个用户检索最佳twee-t,并按喜欢的数量对它们进行排序。
下面首先通过示例进行展示field colla-psing的使用。
1)首先查询发的推特内容中包含elast-icsearch的推文:
GET /twitter/_search
{
"query": {
"match": {
"message": "elasticsearch"
}
},
"collapse" : {
"field" : "user"
},
"sort": ["likes"]
}
返回结果:
{
"took":8,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"skipped":0,
"failed":0
},
"hits":{
"total":5,
"max_score":null,
"hits":[
{
"_index":"mapping_field_collapsing_twitter",
"_type":"_doc",
"_id":"OYnecmcB-IBeb8B-bF2X",
"_score":null,
"_source":{
"message":"to be a elasticsearch",
"user":"user2",
"likes":3
},
"sort":[
3
]
},
{
"_index":"mapping_field_collapsing_twitter",
"_type":"_doc",
"_id":"OonecmcB-IBeb8B-bF2q",
"_score":null,
"_source":{
"message":"to be elasticsearch",
"user":"user2",
"likes":3
},
"sort":[
3
]
},
{
"_index":"mapping_field_collapsing_twitter",
"_type":"_doc",
"_id":"OInecmcB-IBeb8B-bF2G",
"_score":null,
"_source":{
"message":"elasticsearch is very high",
"user":"user1",
"likes":3
},
"sort":[
3
]
},
{
"_index":"mapping_field_collapsing_twitter",
"_type":"_doc",
"_id":"O4njcmcB-IBeb8B-Rl2H",
"_score":null,
"_source":{
"message":"elasticsearch is high db",
"user":"user1",
"likes":1
},
"sort":[
1
]
},
{
"_index":"mapping_field_collapsing_twitter",
"_type":"_doc",
"_id":"N4necmcB-IBeb8B-bF0n",
"_score":null,
"_source":{
"message":"very likes elasticsearch",
"user":"user1",
"likes":1
},
"sort":[
1
]
}
]
}
}
首先上述会列出所有用户的推特,如果只想每个用户只显示一条推文,并且点赞率最高,或者每个用户只显示2条推文呢?这个时候,按字段折叠就闪亮登场了。java demo如下:
public static void search_field_collapsing() {
RestHighLevelClient client = EsClient.getClient();
try {
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("mapping_field_collapsing_twitter");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(
QueryBuilders.matchQuery("message","elasticsearch")
);
sourceBuilder.sort("likes", SortOrder.DESC);
CollapseBuilder collapseBuilder = new CollapseBuilder("user");
sourceBuilder.collapse(collapseBuilder);
searchRequest.source(sourceBuilder);
SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(result);
} catch (Throwable e) {
e.printStackTrace();
} finally {
EsClient.close(client);
}
}
其结果如下:
{
"took":22,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"skipped":0,
"failed":0
},
"hits":{
"total":5,
"max_score":null,
"hits":[
{
"_index":"mapping_field_collapsing_twitter",
"_type":"_doc",
"_id":"OYnecmcB-IBeb8B-bF2X",
"_score":null,
"_source":{
"message":"to be a elasticsearch",
"user":"user2",
"likes":3
},
"fields":{
"user":[
"user2"
]
},
"sort":[
3
]
},
{
"_index":"mapping_field_collapsing_twitter",
"_type":"_doc",
"_id":"OInecmcB-IBeb8B-bF2G",
"_score":null,
"_source":{
"message":"elasticsearch is very high",
"user":"user1",
"likes":3
},
"fields":{
"user":[
"user1"
]
},
"sort":[
3
]
}
]
}
}
上面的示例只返回了每个用户的第一条数据,如果需要每个用户返回2条数据呢?可以通过inner_hit来设置。
public static void search_field_collapsing() {
RestHighLevelClient client = EsClient.getClient();
try {
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("mapping_field_collapsing_twitter");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(
QueryBuilders.matchQuery("message","elasticsearch")
);
sourceBuilder.sort("likes", SortOrder.DESC);
CollapseBuilder collapseBuilder = new CollapseBuilder("user");
InnerHitBuilder collapseHitBuilder = new InnerHitBuilder("collapse_inner_hit");
collapseHitBuilder.setSize(2);
collapseBuilder.setInnerHits(collapseHitBuilder);
sourceBuilder.collapse(collapseBuilder);
searchRequest.source(sourceBuilder);
SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(result);
} catch (Throwable e) {
e.printStackTrace();
} finally {
EsClient.close(client);
}
}
返回结果如下:
{
"took":42,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"skipped":0,
"failed":0
},
"hits":{
"total":5,
"max_score":null,
"hits":[
{
"_index":"mapping_field_collapsing_twitter",
"_type":"_doc",
"_id":"OYnecmcB-IBeb8B-bF2X",
"_score":null,
"_source":{
"message":"to be a elasticsearch",
"user":"user2",
"likes":3
},
"fields":{
"user":[
"user2"
]
},
"sort":[
3
],
"inner_hits":{
"collapse_inner_hit":{
"hits":{
"total":2,
"max_score":0.19363807,
"hits":[
{
"_index":"mapping_field_collapsing_twitter",
"_type":"_doc",
"_id":"OonecmcB-IBeb8B-bF2q",
"_score":0.19363807,
"_source":{
"message":"to be elasticsearch",
"user":"user2",
"likes":3
}
},
{
"_index":"mapping_field_collapsing_twitter",
"_type":"_doc",
"_id":"OYnecmcB-IBeb8B-bF2X",
"_score":0.17225473,
"_source":{
"message":"to be a elasticsearch",
"user":"user2",
"likes":3
}
}
]
}
}
}
},
{
"_index":"mapping_field_collapsing_twitter",
"_type":"_doc",
"_id":"OInecmcB-IBeb8B-bF2G",
"_score":null,
"_source":{
"message":"elasticsearch is very high",
"user":"user1",
"likes":3
},
"fields":{
"user":[
"user1"
]
},
"sort":[
3
],
"inner_hits":{
"collapse_inner_hit":{
"hits":{
"total":3,
"max_score":0.2876821,
"hits":[
{
"_index":"mapping_field_collapsing_twitter",
"_type":"_doc",
"_id":"O4njcmcB-IBeb8B-Rl2H",
"_score":0.2876821,
"_source":{
"message":"elasticsearch is high db",
"user":"user1",
"likes":1
}
},
{
"_index":"mapping_field_collapsing_twitter",
"_type":"_doc",
"_id":"N4necmcB-IBeb8B-bF0n",
"_score":0.2876821,
"_source":{
"message":"very likes elasticsearch",
"user":"user1",
"likes":1
}
}
]
}
}
}
}
]
}
}
此时,返回结果是两级,第一级,还是每个用户第一条消息,然后再内部中嵌套inner_hits。
Search After
Elasticsearch支持的第三种分页获取方式,该方法不支持跳转页面。
es支持的分页方式目前已知:
-
通过from和size,当时当达到深度分页时,成本变的非常高昂,故es提供了索引参数:index.max_result_window来控制(from + size)的最大值,默认为10000,超过该值后将报错。
-
通过scroll滚动API,该方式类似于快照的工作方式,不具备实时性,并且滚动上下文的存储需要耗费一定的性能。
本节将介绍第3种分页方式,search after,基于上一页查询的结果进行下一页数据的查询。基本思想是选择一组排序字段,能做到全局唯一。es的排序查询响应结果中会返回sort数组,包含本排序字段的最大值,下一页查询将该组字段当成查询条件,es在此数据的基础下返回下一批合适的数据。
java示例如下:
public static void search_search_after() {
RestHighLevelClient client = EsClient.getClient();
try {
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("mapping_search_after");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(
QueryBuilders.termQuery("user","user2")
);
sourceBuilder.size(1);
sourceBuilder.sort("id", SortOrder.ASC);
searchRequest.source(sourceBuilder);
SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(result);
if(hasHit(result)) { // 如果本次匹配到数据
// 省略处理数据逻辑
// 继续下一批查询
// result.getHits().
int length = result.getHits().getHits().length;
SearchHit aLastHit = result.getHits().getHits()[length - 1];
//开始下一轮查询
sourceBuilder.searchAfter(aLastHit.getSortValues());
result = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(result);
}
} catch (Throwable e) {
e.printStackTrace();
} finally {
EsClient.close(client);
}
}
private static boolean hasHit(SearchResponse result) {
return !( result.getHits() == null ||
result.getHits().getHits() == null ||
result.getHits().getHits().length < 1 );
}
本文详细介绍 preference、explain、version、index boost、min_score、names query、Inner hits、field collapsing、Search After。