Elasticsearch系列(4)Mapping之字段类型

2020-09-02  本文已影响0人  正义的杰克船长

1. 前言

Mapping(映射)用来定义文档包含的字段名、字段数据类型以及文档如何存储和索引这些字段的规则,例如,使用映射来定义:

例如,文章的索引Mapping定义如下:

Mapping既可以是创建索引时使用,也可以预先显式定义。

映射字段种类包括有:

2. 字段数据类型

索引文档中的每个字段都有一个数据类型。既可以是一种简单的类型(比如text、keyword、date、long等),也可以是JSON分层特性的类型或者特殊类型(比如object、nested、geo_point等)。

同一个字段如果使用不同的数据类型,那么它的意义也是不同的,例如,字符串既可以定义为text字段类型进行分析索引,以便进行全文搜索,也可以定义为keyword字段类型保持原样进行索引,以便进行排序或聚合。

字段类型按照家族系列分组,同一系列的类型支持相同的搜索功能,只是可能在性能特性或存储空间上有所不同。下面介绍几种字段类型。

2.1 普通类型

Binary
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "store": false
      },
      "blob": {
        "type": "binary"
      }
    }
  }
}

PUT my-index-000001/_doc/1
{
  "name": "Some binary blob",
  "blob": "U29tZSBiaW5hcnkgYmxvYg=="
}

GET my-index-000001/_doc/1
{
}
Boolean

(1)创建索引及索引数据

PUT my-index-000002
{"mappings":{"properties":{"is_published":{"type":"boolean"}}}}

POST my-index-000002/_doc/1
{"is_published":true}

POST my-index-000002/_doc/2
{"is_published":"false"}

(2)按照字段is_published聚合:

GET my-index-000002/_search
{"aggs":{"publish_state":{"terms":{"field":"is_published"}}}}

聚合结果片段如下:

"aggregations" : {
    "publish_state" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 0,
          "key_as_string" : "false",
          "doc_count" : 1
        },
        {
          "key" : 1,
          "key_as_string" : "true",
          "doc_count" : 1
        }
      ]
    }
  }
Keyword
PUT my-index-000003
{"mappings":{"properties":{"tags":{"type":"keyword"}}}}

POST my-index-000003/_doc/1
{"tags":"big ball"}

POST my-index-000003/_doc/2
{"tags":"small ball"}

## 通过确切的值搜索
GET my-index-000003/_search
{"query":{"match":{"tags":"small ball"}}}
Numbers
PUT my-index-000004
{"mappings":{"properties":{"number_of_bytes":{"type":"integer"},"time_in_seconds":{"type":"float"},"price":{"type":"scaled_float","scaling_factor":100}}}}

POST my-index-000004/_doc/1
{"number_of_bytes":100,"time_in_seconds":1,"scaling_factor":100}

GET my-index-000004/_search
{}
Dates
PUT my-index-000005
{"mappings":{"properties":{"date":{"type":"date","store":true}}}}

PUT my-index-000005/_doc/1
{"date":"2015-01-01"} 

PUT my-index-000005/_doc/2
{ "date": "2015-01-01T12:10:30Z" } 

PUT my-index-000005/_doc/3
{"date":1420070400001} 

GET my-index-000005/_search
{"sort":{"date":"asc"}}
// 必须日期+可选时间 或者 时间毫秒数
"strict_date_optional_time||epoch_millis"
PUT my-index-000005_01
{
  "mappings": {
    "properties": {
      "date": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      }
    }
  }
}
Alias
PUT my-index-000006
{
  "mappings": {
    "properties": {
      "distance": {
        "type": "long"
      },
      "route_length_miles": {
        "type": "alias",
        "path": "distance"
      }
    }
  }
}
PUT my-index-000006/_doc/1
{"distance":50} 
PUT my-index-000006/_doc/2
{"distance":10} 
GET my-index-000006/_search
{"query":{"range":{"route_length_miles":{"gte":39}}}}

注:path参数值必须对应目标字段的全路径,包括目录字段的父级字段,如object1.object2.field。

2.2 对象和关系型类型

Object
PUT my-index-000007
{
  "mappings": {
    "properties": {
      "region": {
        "type": "keyword"
      },
      "manager": {
        "properties": {
          "age": {
            "type": "integer"
          },
          "name": {
            "properties": {
              "first": {
                "type": "text"
              },
              "last": {
                "type": "text"
              }
            }
          }
        }
      }
    }
  }
}

数据索引时,支持key-value键值对格式的数据。

PUT my-index-000007/_doc/1
{
  "region": "US",
  "manager": {
    "age": 30,
    "name": {
      "first": "John",
      "last": "Smith"
    }
  }
}
PUT my-index-000007/_doc/2
{
  "region": "China",
  "manager.age": 100,
  "manager.name.first": "SanFeng",
  "manager.name.last": "Zhang"
}
Flattened
PUT my-index-000008
{"mappings":{"properties":{"title":{"type":"text"},"labels":{"type":"flattened"}}}}
POST my-index-000008/_doc/1
{"title":"Results are not sorted correctly.","labels":{"priority":"urgent","release":["v1.2.5","v1.3.0"],"timestamp":{"created":1541458026,"closed":1541457010}}}

搜索数据时,支持基本的查询

GET my-index-000008/_search
{"query":{"match":{"labels.release":"v1.2.5"}}}
Nested
PUT my-index-000009
{"mappings":{"properties":{"user":{"type":"nested"}}}}

PUT my-index-000009/_doc/1
{"group":"fans","user":[{"first":"John","last":"Smith"},{"first":"Alice","last":"White"}]}

# 返回匹配的数据,如果是普通Object类型,查不到数据
GET my-index-000009/_search
{"query":{"nested":{"path":"user","query":{"bool":{"must":[{"match":{"user.first":"Alice"}},{"match":{"user.last":"White"}}]}}}}}

# 返回匹配的数据,并且高亮显示每个对象中first字段
GET my-index-000009/_search
{"query":{"nested":{"path":"user","query":{"bool":{"must":[{"match":{"user.first":"Alice"}},{"match":{"user.last":"White"}}]}},"inner_hits":{"highlight":{"fields":{"user.first":{}}}}}}}
Join
# my_join_field字段,定义了简单的关系,question是answer的父类。
PUT my-index-000010
{
  "mappings": {
    "properties": {
      "my_id": {
        "type": "keyword"
      },
      "my_join_field": { 
        "type": "join",
        "relations": {
          "question": "answer" 
        }
      }
    }
  }
}

其中,名称为my_join_field字段为join字段类型,定义了简单的关系,question是answer的父类。
(2)索引name="question"的数据:

# 索引数据
PUT my-index-000010/_doc/1?refresh
{
  "my_id": "1",
  "text": "This is a question",
  "my_join_field": {
    "name": "question" 
  }
}

PUT my-index-000010/_doc/2?refresh
{
  "my_id": "2",
  "text": "This is another question",
  "my_join_field": {
    "name": "question"
  }
}

(3)接下来索引name="answer"(指定parent=1,即与文档1建立父类关系)的数据:

PUT my-index-000010/_doc/3?routing=1&refresh 
{
  "my_id": "3",
  "text": "This is an answer",
  "my_join_field": {
    "name": "answer", 
    "parent": "1" 
  }
}

PUT my-index-000010/_doc/4?routing=1&refresh
{
  "my_id": "4",
  "text": "This is another answer",
  "my_join_field": {
    "name": "answer",
    "parent": "1"
  }
}

(4)查询聚合数据,条件是name="answer" 并且 parent_id=1。

GET my-index-000010/_search
{
  "query": {
    "parent_id": { 
      "type": "answer",
      "id": "1"
    }
  },
  "aggs": {
    "parents": {
      "terms": {
        "field": "my_join_field#question", 
        "size": 10
      }
    }
  }
}

(5)返回结果片段如下

"aggregations" : {
    "parents" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1",
          "doc_count" : 2
        }
      ]
    }
  }

2.3 结构体类型

Range
PUT my-index-000011
{
  "mappings": {
    "properties": {
      "expected_attendees": {
        "type": "integer_range"
      },
      "time_frame": {
        "type": "date_range",
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      },
      "ip_allowlist": {
        "type": "ip_range"
      }
    }
  }
}

PUT my-index-000011/_doc/1?refresh
{
  "expected_attendees": {
    "gte": 10,
    "lte": 20
  },
  "time_frame": {
    "gte": "2015-10-31 12:00:00",
    "lte": "2015-11-01"
  },
  "ip_allowlist": "192.168.0.0/16"
}
# term query
GET my-index-000011/_search
{"query":{"term":{"expected_attendees":{"value":12}}}}
# 时间范围查询是否匹配
GET my-index-000011/_search
{"query":{"range":{"time_frame":{"gte":"2015-10-31","lte":"2015-11-01","relation":"within"}}}}
# IP地址查询是否匹配
GET my-index-000011/_search
{"query":{"term":{"ip_allowlist":{"value":"192.124.1.100"}}}}
IP
# 创建索引,Mapping包含ip类型
PUT my-index-000012
{
  "mappings": {
    "properties": {
      "ip_addr": {
        "type": "ip"
      }
    }
  }
}
#索引数据
PUT my-index-000012/_doc/1
{"ip_addr":"192.168.1.1"}
#查询ip数据
GET my-index-000012/_search
{"query":{"term":{"ip_addr":"192.168.0.0/16"}}}

2.4 聚合数据类型

Histogram
# my_histogram, 直方图类型字段存储百分比数据
# my_text, keyword类型字段存储直方图标题
PUT my-index-000013
{
  "mappings" : {
    "properties" : {
      "my_histogram" : {
        "type" : "histogram"
      },
      "my_text" : {
        "type" : "keyword"
      }
    }
  }
}
# 存储预聚合数据到histogram_1 和 histogram_2
PUT my-index-000013/_doc/1
{
  "my_text" : "histogram_1",
  "my_histogram" : {
      "values" : [0.1, 0.2, 0.3, 0.4, 0.5], 
      "counts" : [3, 7, 23, 12, 6] 
   }
}
PUT my-index-000013/_doc/2
{
  "my_text" : "histogram_2",
  "my_histogram" : {
      "values" : [0.1, 0.25, 0.35, 0.4, 0.45, 0.5], 
      "counts" : [8, 17, 8, 7, 6, 2] 
   }
}

2.5 文本搜索类型

Text
PUT my-index-000014
{
  "mappings": {
    "properties": {
      "full_name": {
        "type":  "text"
      }
    }
  }
}

PUT my-index-000014/_doc/1
{"full_name":"Johnny Lu"}
completion
PUT my-index-000015
{
  "mappings": {
    "properties": {
      "suggest": {
        "type": "completion"
      },
      "title": {
        "type": "keyword"
      }
    }
  }
}
search_as_you_type
PUT my-index-000016
{"mappings":{"properties":{"my_field":{"type":"search_as_you_type"}}}}
PUT my-index-000016/_doc/1?refresh
{"my_field":"quick brown fox jump lazy dog"}
GET my-index-000016/_search
{
  "query": {
    "multi_match": {
      "query": "brown f",
      "type": "bool_prefix",
      "fields": [
        "my_field",
        "my_field._2gram",
        "my_field._3gram"
      ]
    }
  }
}
Token count
PUT my-index-000017
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "length": {
            "type": "token_count",
            "analyzer": "standard"
          }
        }
      }
    }
  }
}
PUT my-index-000017/_doc/1
{ "name": "John Smith" }
PUT my-index-000017/_doc/2
{ "name": "Rachel Alice Williams" }
GET my-index-000017/_search
{"query":{"term":{"name.length":3}}}

2.6 文档排名类型

Dense vector
PUT my-index-000018
{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "dims": 3  
      },
      "my_text" : {
        "type" : "keyword"
      }
    }
  }
}
PUT my-index-000018/_doc/1
{"my_text":"text1","my_vector":[0.5,10,6]}
PUT my-index-000018/_doc/2
{"my_text":"text2","my_vector":[-0.5,10,10]}
Rank feature
PUT my-index-000019
{
  "mappings": {
    "properties": {
      "pagerank": {
        "type": "rank_feature" 
      },
      "url_length": {
        "type": "rank_feature",
        "positive_score_impact": false 
      }
    }
  }
}
PUT my-index-000019/_doc/1
{"pagerank":8,"url_length":22}
PUT my-index-000019/_doc/2
{"pagerank":9,"url_length":23}
# 使用rank_feature查询,与值正相关排序
GET my-index-000019/_search
{"query":{"rank_feature":{"field":"pagerank"}}}
Rank features
PUT my-index-000020
{
  "mappings": {
    "properties": {
      "topics": {
        "type": "rank_features" 
      }
    }
  }
}
PUT my-index-000020/_doc/1
{"topics":{"politics":20,"economics":50.8}}
PUT my-index-000020/_doc/2
{"topics":{"politics":5.2,"sports":80.1}}
GET my-index-000020/_search
{"query":{"rank_feature":{"field":"topics.politics"}}}

2.7 空间数据类型

Geo point
PUT my-index-000021
{
  "mappings": {
    "properties": {
      "location": {
        "type": "geo_point"
      }
    }
  }
}
# 使用object对象类型,指定lat和lon,表示geo_point:
PUT my-index-000021/_doc/1
{"text":"Geo-point as an object","location":{"lat":41.12,"lon":-71.34}}
# 使用字符串格式:“lat,lon” ,表示geo_point:
PUT my-index-000021/_doc/2
{"text":"Geo-point as a string","location":"41.12,-71.34"}
# 使用geohash字符串,表示geo_point
PUT my-index-000021/_doc/3
{"text":"Geo-point as a geohash","location":"drm3btev3e86"}
# 使用数组形式:[lon, lat],表示geo_point
PUT my-index-000021/_doc/4
{"text":"Geo-point as an array","location":[-71.34,41.12]}
# 使用文本点表示,格式为“POINT(lon lat)”,表示geo_point
PUT my-index-000021/_doc/5
{"text":"Geo-point as a WKT POINT primitive","location":"POINT (-71.34 41.12)"}

GET my-index-000021/_search
{
  "query": {
    "geo_bounding_box": { 
      "location": {
        "top_left": {
          "lat": 42,
          "lon": -72
        },
        "bottom_right": {
          "lat": 40,
          "lon": -74
        }
      }
    }
  }
}

Point
PUT my-index-000022
{
  "mappings": {
    "properties": {
      "location": {
        "type": "point"
      }
    }
  }
}
# 使用object对象类型,指定x和y,表示point:
PUT my-index-000022/_doc/1
{"text":"Point as an object","location":{"x":41.12,"y":-71.34}}
# 使用字符串格式:“x,y” ,表示point:
PUT my-index-000022/_doc/2
{"text":"Point as a string","location":"41.12,-71.34"}
# 使用数组形式:[x, y],表示point
PUT my-index-000022/_doc/3
{"text":"Point as an array","location":[41.12,-71.34]}
# 使用文本点表示,格式为“POINT(x y)”,表示geo_point
PUT my-index-000022/_doc/4
{"text":"Point as a WKT POINT primitive","location":"POINT (41.12 -71.34)"}

2.8 其他类型

Percolator
# 配置percolator字段类型
PUT my-index-01
{
  "mappings": {
    "properties": {
      "query" : {
        "type" : "percolator"
      },
      "body" : {
        "type": "text"
      }
    }
  }
}
# 为索引定义一个别名,这样,在重索引系统/应用程序的情况下,不需要更改查询索引名
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "my-index",
        "alias": "queries" 
      }
    }
  ]
}
# 将原生查询解析并索引,查询条件为文档字段body包含"quick brown fox"之一
PUT queries/_doc/1?refresh
{
  "query" : {
    "match" : {
      "body" : "quick brown fox"
    }
  }
}
# percolate查询,所提供文档的字段名与查询条件中的字段名(如body)必须相同
GET /queries/_search
{
  "query": {
    "percolate" : {
      "field" : "query",
      "document" : {
        "body" : "fox jumps over the lazy dog"
      }
    }
  }
}

3. 结语

这里只是简单介绍了Elasticsearch的部分字段类型,如有需要,可通过Elasticsearch官方文档进一步了解更多数据类型。

上一篇 下一篇

猜你喜欢

热点阅读