ElasticSearch | 数据建模

2020-05-28 本文已影响0人乌鲁木齐001号程序员

数据建模

逻辑模型 | 功能需求

实体属性
实体之间的关系
搜索相关的配置

物理模型 | 性能需求

Setting：分片数量
Mapping：字段配置 / 关系处理

如何对字段进行建模

字段类型 | Text vs Keyword

Text

用于全文本字段，文本会被 Analyzer 分词；
默认不支持聚合分析及排序，需要设置 fielddata 为 true；

Keyword

用于 Id，枚举及不需要分词的文本，例如：电话号码、email、手机号、邮政编码、性别等；
使用于 Filter（精确匹配），Sorting 和 Aggregation；

设置多字段类型

默认会为文本类型设置成 text，并且设置一个 keyword 的子字段；
在处理人类语言时，通过增加英文，拼音和标准分词器，提高搜索结果；

字段类型 | 结构化数据

数值类型 - 尽量选择贴近的类型，例如：可以用 byte 就不用 long；
枚举类型 - 设置为 keyword，即便是数字，也应该设置成 keyword，获取更加好的性能；
其他 - 日期 / 布尔 / 地理信息

是否需要搜索（检索）及分词

如不需要检索、排序和聚合，Enable 设置成 false；
如果不需要检索，Index 就设置成 false；
对于需要检索的字段，可以通过设置 index_options / norms 设定存储的粒度，如果不需要归一化数据时，可以将这些设定关闭；

是否需要聚合及排序

如不需要检索、排序和聚合，Enable 设置成 false；
如果不需要排序或者聚合分析，Doc Values / fielddate 设置成 false；
更新频繁，聚合查询频繁的 keyword 类型的字段，推荐将 eager_global_ordinals 设置为 true；

是否需要额外的存储

如果需要设置额外的存储时，Store 设置为true，可以存储该字段的原始内容，一般结合将 _source 的 enabled 设置成 false 一起使用；
如果将 _source 给 disabled 掉，可以节省磁盘，适用于指标型数据；一般不建议随意将 _source 设置为 disabled，一般建议考虑增加压缩比，因为把 _source 字段 disabled 之后，无法做 Reindex 和 Update；

数据建模 | 举个栗子

写入一本书的信息

PUT books/_doc/1
{
  "title":"Mastering ElasticSearch 5.0",
  "description":"Master the searching, indexing, and aggregation features in ElasticSearch Improve users’ search experience with Elasticsearch’s functionalities and develop your own Elasticsearch plugins",
  "author":"Bharvi Dixit",
  "public_date":"2017",
  "cover_url":"https://images-na.ssl-images-amazon.com/images/I/51OeaMFxcML.jpg"
}

查看 Dynamic Mapping 的结果

类型都推断成了 text，然后加了个 keyword 子字段；

GET books/_mapping

推断的不甚理想，删掉，手动优化一下

cover_url 直接设置成 keyword，index 设置成 false 表明 cover_url 字段不支持搜索，但是支持 Terms 聚合；如果将 enable 设置成 false，则表明无法搜索和聚合；

PUT books
{
      "mappings" : {
      "properties" : {
        "author" : {"type" : "keyword"},
        "cover_url" : {"type" : "keyword","index": false},
        "description" : {"type" : "text"},
        "public_date" : {"type" : "date"},
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 100
            }
          }
        }
      }
    }
}

新需求

增加 content 字段，要求能被搜索，并支持高亮显示；
新需求会导致 _source 的内容过大，Source Filter 只是将结果传输给客户端的时候进行过滤，但是内部做 Query-Then-Fetch 的时候，ElasticSearch 还是会传输 _source 中的数据；
解决方案：将 _source 字段的 enable 设置成 false，同时将每个字段的 store 设置成 true，从而解决字段过大引发的性能问题；

DELETE books

#新增 Content字段。数据量很大。选择将Source 关闭
PUT books
{
      "mappings" : {
      "_source": {"enabled": false},
      "properties" : {
        "author" : {"type" : "keyword","store": true},
        "cover_url" : {"type" : "keyword","index": false,"store": true},
        "description" : {"type" : "text","store": true},
        "content" : {"type" : "text","store": true},
        "public_date" : {"type" : "date","store": true},
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 100
            }
          },
          "store": true
        }
      }
    }
}

PUT books/_doc/1
{
  "title":"Mastering ElasticSearch 5.0",
  "description":"Master the searching, indexing, and aggregation features in ElasticSearch Improve users’ search experience with Elasticsearch’s functionalities and develop your own Elasticsearch plugins",
  "content":"The content of the book......Indexing data, aggregation, searching.    something else. something in the way............",
  "author":"Bharvi Dixit",
  "public_date":"2017",
  "cover_url":"https://images-na.ssl-images-amazon.com/images/I/51OeaMFxcML.jpg"
}

查询结果中，Source不包含数据

POST books/_search
{}

搜索，通过 store 字段显示数据，同时高亮显示 conent 的内容

POST books/_search
{
  "stored_fields": ["title","author","public_date"],
  "query": {
    "match": {
      "content": "searching"
    }
  },
  "highlight": {
    "fields": {
      "content":{}
    }
  }
}

ElasticSearch | 数据建模

数据建模

逻辑模型 | 功能需求

物理模型 | 性能需求

如何对字段进行建模

字段类型 | Text vs Keyword

字段类型 | 结构化数据

是否需要搜索（检索）及分词

是否需要聚合及排序

是否需要额外的存储

数据建模 | 举个栗子

写入一本书的信息

查看 Dynamic Mapping 的结果

推断的不甚理想，删掉，手动优化一下

新需求

查询结果中，Source不包含数据

搜索，通过 store 字段显示数据，同时高亮显示 conent 的内容

猜你喜欢

热点阅读