ES深度分页踩坑

2021-02-10  本文已影响0人  _空格键_

先来看一个异常

org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
        at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177)
        at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:618)
        at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:594)
        at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:501)
        at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:474)
        at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:391)
        at com.xxxx.assets.service.es.factory.rest.EsHighClientService.queryByPage(EsHighClientService.java:82)
        ... 21 common frames omitted
        Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://abc.xxxx.com:9900], URI [/blood_relation_index/blood_relation/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&search_type=dfs_query_then_fetch&batched_reduce_size=512], status line [HTTP/1.1 500 Internal Server Error]{"error":{"root_cause":[{"type":"query_phase_execution_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [20000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"blood_relation_index","node":"RKah0wB7RDeQMmmawJqMHA","reason":{"type":"query_phase_execution_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [20000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}}]},"status":500}
                at org.elasticsearch.client.RestClient$1.completed(RestClient.java:357)
                at org.elasticsearch.client.RestClient$1.completed(RestClient.java:346)
                at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:122)
                at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177)
                at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436)
                at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:326)
                at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265)
                at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
                at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
                at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
                at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
                at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
                at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
                at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
                at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
                at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
                ... 1 common frames omitted

关键信息摘录出来: ResponseException POST 500

{
    "error": {
        "root_cause": [
            {
                "type": "query_phase_execution_exception",
                "reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [20000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "blood_relation_index",
                "node": "RKah0wB7RDeQMmmawJqMHA",
                "reason": {
                    "type": "query_phase_execution_exception",
                    "reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [20000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
                }
            }
        ]
    },
    "status": 500
}

返回的结果window太大,from+size 必须<=[10000],但是当前查询是[20000]。请求大数据集的更有效的方式可参阅scroll api。也可通过更改[index.max_result_window] 进行设置。

分析

ES服务器设置的 index.max_result_window=10000,我们查询的返回结果超出了这个限制。

问题:为什么会查过?

普通ES分页查询

假设分页查询,每页size=100,你查询第100页,from和size分别是from=(100 - 1) * 100=9900, size=100,这时ES需要从各个分片上跟别取出10000条数据,如果是3各分片,总共就是3*10000条数据,然后汇总排序、过滤,再取出最终符合条件的100条数据。如果查询 第101页,这时from=10000,ES从各分片取出10100条数据。

深度查询问题

显然,随着分页的越深入,ES从各分片上查询的数据量越大,性能时指数级下降。

为什么要设置 index.max_result_window=10000,就是出于这种考虑,防止耗尽ES内存资源,产生OOM。

优化解决

可以根据场景区分:
1、对于深度翻阅查询没要求的需求,可以限制查询的翻页深度和数据量。
2、或者限制操作行为,禁止跳跃翻页查询,这时可以使用scroll进行滚动查询。

上一篇 下一篇

猜你喜欢

热点阅读