ElasticSearch在使用过程中遇到的一些问题

2019-01-24 本文已影响0人秋慕云

一、ElasticSearch的默认时间

在使用ElasticSearch的过程中，如果建立索引的时候没有指定index.refresh_interval刷新时间，则默认为1s。

意思是，等过了1s后ElasticSearch才会对数据进行“落库”处理。落库前是查不到这条数据的。可以对比MySQL的事务，当事务提交后，数据才会落库，在这之前，数据被缓存，查询不到该条数据。

如果我们对ES有这样的需求：在对数据进行操作后，要及时返回刚刚操作完毕的数据，或者数据列表。

比如，存储一条数据后，立刻查询该数据，能立即返回这条新插入的数据，这个时候，使用默认的index.refresh_interval则会出问题。

现象是，Elasticsearch查不到新插入的这条数据，但是过了1s后再查，就能查到这条数据。

解决方案：

延迟查询，如下：

Thread.sleep(2000L);

该方案面对数据量大的查询时，耗时比较长。

插入、更新及时刷新，如下：

BulkRequestBuilder bulkRequest = ESTools.client.prepareBulk().setRefresh(true);

这里的setRefresh(true);就是自动刷新的用处。所以在对ES进行CRUD的时候，如果对数据增删改操作要及时返回最新数据，那我们就需要加这个方法，及时刷新数据。

当然，Elasticsearch如果把index.refresh_interval设置为-1，则不会刷新索引，风险是因为任何分片的故障有可能导致数据丢失，优势是索引会更快，因为文件将只有一次索引，不会频繁更新

二、ElasticSearch建索引的时候未设置标准化配置normalizer

如果创建index时，mapping没有指定某个filed的标准化配置normalizer，那么写入ES的是大写，搜索出来看到的结果也是大写，但是创建的索引却是小写，以至于搜索的时候使用term会失败。

参照如下type，所有字段未进行分词：

{
  "_all": {
    "enabled": false
  },
  "properties": {
    "id": {
      "type": "string",
      "index": "not_analyzed"
    },
    "dbName": {
      "type": "string",
      "index": "not_analyzed"
    },
    "tbName": {
      "type": "string",
      "index": "not_analyzed"
    },
    "createUser": {
      "type": "string",
      "index": "not_analyzed"
    },
    "createDate": {
      "type": "date",
      "index": "not_analyzed"
    },
    "modifyUser": {
      "type": "string",
      "index": "not_analyzed"
    },
    "modifyDate": {
      "type": "date",
      "index": "not_analyzed"
    }
  }
}

以"tbName"字段为例，只是指定了String类型。存储结果为：

{
      "_index" : "authority",
      "_type" : "authority_tb_info",
      "_id" : "47",
      "_score" : 1.0327898,
      "_routing" : "1290962498",
      "_source" : {
        "id" : "47",
        "tbName" : "Test182",
        "dbName" : "app18",
        "modifyUser" : "testUser",
        "createUser" : "testUser",
        "createDate" : "1547793434000"
        "modifyDate" : "1547793434000",
      }
}

当使用term查询时，结果为空，如下：

{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "tbName": "Test182"
          }
        }
      ]
    }
  }
}

当使用match查询时，可以查到结果，如下：

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "tbName": "Test182"
          }
        }
      ]
    }
  }
}

解决方案：

因为mapping时未设置分词，使用term(过滤)来查询，的方案有两个：

在代码中，对使用分词的值进行小写处理，使用String.toLowerCase，
修改index的mapping，指定normalizer，如下：

"settings": {

    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "filter": ["lowercase", "asciifolding"]
        }
      }
    }
  },  
   "mappings": {
    "type": {
      "properties": {
        "foo": {
          "type": "keyword",
          "normalizer": "my_normalizer"
        }
      }
    }
  }

三、ElasticSearch聚合后无法分页

ElasticSearch 可以按照某一字段进行聚合，类似MySQL的group by，进行去重处理，但是如果此刻有需求:去重+分页，如果数据量不大的情况，可以在内存中进行分页，但是如果数据量巨大，那么就需要重新设置ES的type结构了，因为ES聚合后不支持分页。

ElasticSearch在使用过程中遇到的一些问题

一、ElasticSearch的默认时间

二、ElasticSearch建索引的时候未设置标准化配置normalizer

三、ElasticSearch聚合后无法分页

猜你喜欢

热点阅读