Elasticsearch 6.x Mapping设置

2018-08-16  本文已影响0人  小旋锋的简书

Mapping

类似于数据库中的表结构定义,主要作用如下:

需要注意的是,在索引中定义太多字段可能会导致索引膨胀,出现内存不足和难以恢复的情况,下面有几个设置:

数据类型

核心数据类型

# 创建range索引
PUT range_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "expected_attendees": {
          "type": "integer_range"
        },
        "time_frame": {
          "type": "date_range", 
          "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
        }
      }
    }
  }
}

# 插入一个文档
PUT range_index/_doc/1
{
  "expected_attendees" : { 
    "gte" : 10,
    "lte" : 20
  },
  "time_frame" : { 
    "gte" : "2015-10-31 12:00:00", 
    "lte" : "2015-11-05"
  }
}

# 12在 10~20的范围内,可以搜索到文档1
GET range_index/_search
{
  "query" : {
    "term" : {
      "expected_attendees" : {
        "value": 12
      }
    }
  }
}

# within可以搜索到文档
# 可以修改日期,然后分别对比CONTAINS,WITHIN,INTERSECTS的区别
GET range_index/_search
{
  "query" : {
    "range" : {
      "time_frame" : { 
        "gte" : "2015-11-02",
        "lte" : "2015-11-03",
        "relation" : "within" 
      }
    }
  }
}

复杂数据类型

# tags字符串数组,lists 对象数组
PUT my_index/_doc/1
{
  "message": "some arrays in this document...",
  "tags":  [ "elasticsearch", "wow" ], 
  "lists": [ 
    {
      "name": "prog_list",
      "description": "programming list"
    },
    {
      "name": "cool_list",
      "description": "cool stuff list"
    }
  ]
}

嵌套类型与Object类型的区别

通过例子来说明:

  1. 插入一个文档,不设置mapping,此时 user 字段被自动识别为对象数组
DELETE my_index

PUT my_index/_doc/1
{
  "group" : "fans",
  "user" : [ 
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
}

  1. 查询 user.first为 Alice,user.last 为 Smith的文档,理想中应该找不到匹配的文档
  2. 结果是查到了文档1,为什么呢?
GET my_index/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "user.first": "Alice" }},
        { "match": { "user.last":  "Smith" }}
      ]
    }
  }
}
  1. 是由于Object对象类型在内部被转化成如下格式的文档:
{
  "group" :        "fans",
  "user.first" : [ "alice", "john" ],
  "user.last" :  [ "smith", "white" ]
}
  1. user.first 和 user.last 扁平化为多值字段,alice 和 white 的关联关系丢失了。导致这个文档错误地匹配对 alice 和 smith 的查询

  2. 如果最开始就把user设置为 nested 嵌套对象呢?

DELETE my_index
PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "user": {
          "type": "nested" 
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "group": "fans",
  "user": [
    {
      "first": "John",
      "last": "Smith"
    },
    {
      "first": "Alice",
      "last": "White"
    }
  ]
}
  1. 再来进行查询,可以发现以下第一个查不到文档,第二个查询到文档1,符合我们预期
GET my_index/_search
{
  "query": {
    "nested": {
      "path": "user",
      "query": {
        "bool": {
          "must": [
            { "match": { "user.first": "Alice" }},
            { "match": { "user.last":  "Smith" }} 
          ]
        }
      }
    }
  }
}

GET my_index/_search
{
  "query": {
    "nested": {
      "path": "user",
      "query": {
        "bool": {
          "must": [
            { "match": { "user.first": "Alice" }},
            { "match": { "user.last":  "White" }} 
          ]
        }
      },
      "inner_hits": { 
        "highlight": {
          "fields": {
            "user.first": {}
          }
        }
      }
    }
  }
}
  1. nested对象将数组中每个对象作为独立隐藏文档来索引,这意味着每个嵌套对象都可以独立被搜索

  2. 需要注意的是:

地理位置数据类型

专用数据类型

# ip类型,存储IP
PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "ip_addr": {
          "type": "ip"
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "ip_addr": "192.168.1.1"
}

GET my_index/_search
{
  "query": {
    "term": {
      "ip_addr": "192.168.0.0/16"
    }
  }
}

多字段特性 multi-fields

设置Mapping

image
GET my_index/_mapping

# 结果
{
  "my_index": {
    "mappings": {
      "doc": {
        "properties": {
          "age": {
            "type": "integer"
          },
          "created": {
            "type": "date"
          },
          "name": {
            "type": "text"
          },
          "title": {
            "type": "text"
          }
        }
      }
    }
  }
}

Mapping参数

analyzer

boost

dynamic

PUT my_index
{
  "mappings": {
    "_doc": {
      "dynamic": false, 
      "properties": {
        "user": { 
          "properties": {
            "name": {
              "type": "text"
            },
            "social_networks": { 
              "dynamic": true,
              "properties": {}
            }
          }
        }
      }
    }
  }
}

定义后my_index这个索引下不能自动新增字段,但是在user.social_networks下可以自动新增子字段

copy_to

DELETE my_index
PUT my_index
{
  "mappings": {
    "doc": {
      "properties": {
        "first_name": {
          "type": "text",
          "copy_to": "full_name" 
        },
        "last_name": {
          "type": "text",
          "copy_to": "full_name" 
        },
        "full_name": {
          "type": "text"
        }
      }
    }
  }
}

PUT my_index/doc/1
{
  "first_name": "John",
  "last_name": "Smith"
}

GET my_index/_search
{
  "query": {
    "match": {
      "full_name": { 
        "query": "John Smith",
        "operator": "and"
      }
    }
  }
}

index

index_options

fielddata

eager_global_ordinals

doc_values

fields

# 设置 mapping
PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "city": {
          "type": "text",
          "fields": {
            "raw": { 
              "type":  "keyword"
            }
          }
        }
      }
    }
  }
}

# 插入两条数据
PUT my_index/_doc/1
{
  "city": "New York"
}

PUT my_index/_doc/2
{
  "city": "York"
}

# 查询,city用于全文索引 match,city.raw用于排序和聚合
GET my_index/_search
{
  "query": {
    "match": {
      "city": "york" 
    }
  },
  "sort": {
    "city.raw": "asc" 
  },
  "aggs": {
    "Cities": {
      "terms": {
        "field": "city.raw" 
      }
    }
  }
}

format

名称 格式
epoch_millis 时间戳(单位:毫秒)
epoch_second 时间戳(单位:秒)
date_optional_time
basic_date yyyyMMdd
basic_date_time yyyyMMdd'T'HHmmss.SSSZ
basic_date_time_no_millis yyyyMMdd'T'HHmmssZ
basic_ordinal_date yyyyDDD
basic_ordinal_date_time yyyyDDD'T'HHmmss.SSSZ
basic_ordinal_date_time_no_millis yyyyDDD'T'HHmmssZ
basic_time HHmmss.SSSZ
basic_time_no_millis HHmmssZ
basic_t_time 'T'HHmmss.SSSZ
basic_t_time_no_millis 'T'HHmmssZ

properties

PUT my_index
{
  "mappings": {
    "_doc": { 
      "properties": {
        "manager": { 
          "properties": {
            "age":  { "type": "integer" },
            "name": { "type": "text"  }
          }
        },
        "employees": { 
          "type": "nested",
          "properties": {
            "age":  { "type": "integer" },
            "name": { "type": "text"  }
          }
        }
      }
    }
  }
}

PUT my_index/_doc/1 
{
  "region": "US",
  "manager": {
    "name": "Alice White",
    "age": 30
  },
  "employees": [
    {
      "name": "John Smith",
      "age": 34
    },
    {
      "name": "Peter Brown",
      "age": 26
    }
  ]
}

normalizer

PUT test_index_4
{
  "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": [],
          "filter": ["uppercase", "asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "foo": {
          "type": "keyword",
          "normalizer": "my_normalizer"
        }
      }
    }
  }
}

# 插入数据
POST test_index_4/_doc/1
{
  "foo": "hello world"
}

POST test_index_4/_doc/2
{
  "foo": "Hello World"
}

POST test_index_4/_doc/3
{
  "foo": "hello elasticsearch"
}

# 搜索hello,结果为空,而不是3条!! 
GET test_index_4/_search
{
  "query": {
    "match": {
      "foo": "hello"
    }
  }
}

# 搜索 hello world,结果2条,1 和 2
GET test_index_4/_search
{
  "query": {
    "match": {
      "foo": "hello world"
    }
  }
}

其他字段

Dynamic Mapping

ES是依靠JSON文档的字段类型来实现自动识别字段类型,支持的类型如下:

JSON 类型 ES 类型
null 忽略
boolean boolean
浮点类型 float
整数 long
object object
array 由第一个非 null 值的类型决定
string 匹配为日期则设为date类型(默认开启);
匹配为数字则设置为 float或long类型(默认关闭);
设为text类型,并附带keyword的子字段

举栗子

POST my_index/doc
{
  "username":"whirly",
  "age":22,
  "birthday":"1995-01-01"
}
GET my_index/_mapping

# 结果
{
  "my_index": {
    "mappings": {
      "doc": {
        "properties": {
          "age": {
            "type": "long"
          },
          "birthday": {
            "type": "date"
          },
          "username": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

日期的自动识别

# 自定义日期识别格式
PUT my_index
{
  "mappings": {
    "_doc": {
      "dynamic_date_formats": ["MM/dd/yyyy"]
    }
  }
}
# 关闭日期自动识别机制
PUT my_index
{
  "mappings": {
    "_doc": {
      "date_detection": false
    }
  }
}

数字的自动识别

Dynamic templates

允许根据ES自动识别的数据类型、字段名等来动态设定字段类型,可以实现如下效果:

Dynamic templates API

"dynamic_templates": [
    {
      "my_template_name": { 
        ...  match conditions ... 
        "mapping": { ... } 
      }
    },
    ...
]

匹配规则一般有如下几个参数:

# double类型的字段设定为float以节省空间
PUT my_index
{
  "mappings": {
    "_doc": {
      "dynamic_templates": [
        {
          "integers": {
            "match_mapping_type": "double",
            "mapping": {
              "type": "float"
            }
          }
        }
      ]
    }
  }
}
自定义Mapping的建议
  1. 写入一条文档到ES的临时索引中,获取ES自动生成的Mapping
  2. 修改步骤1得到的Mapping,自定义相关配置
  3. 使用步骤2的Mapping创建实际所需索引

Index Template 索引模板

# 创建索引模板,匹配 test-index-map 开头的索引
PUT _template/template_1
{
  "index_patterns": ["test-index-map*"],
  "order": 2,
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "doc": {
      "_source": {
        "enabled": false
      },
      "properties": {
        "name": {
          "type": "keyword"
        },
        "created_at": {
          "type": "date",
          "format": "YYYY/MM/dd HH:mm:ss"
        }
      }
    }
  }
}

# 插入一个文档
POST test-index-map_1/doc
{
  "name" : "小旋锋",
  "created_at": "2018/08/16 20:11:11"
}

# 获取该索引的信息,可以发现 settings 和 mappings 和索引模板里设置的一样
GET test-index-map_1

# 删除
DELETE /_template/template_1

# 查询
GET /_template/template_1

更多内容请访问我的个人网站: http://laijianfeng.org
参考文档:

  1. elasticsearch 官方文档
  2. 慕课网 Elastic Stack从入门到实践
上一篇下一篇

猜你喜欢

热点阅读