Elasticsearch API:Search
URI Search
curl -XGET 'http://10.213.10.30:10920/megacorp/employee/_search?pretty&q=last_name:Smith'
$ curl -XGET 'http://10.213.10.30:10920/megacorp/employee/_search?pretty&q=last_name:Smith'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.2876821,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests" : [
"music"
]
}
}
]
}
}
Request Body Search
curl -XGET 'http://10.213.10.30:10920/megacorp/employee/_search?pretty' -d '
{
"query": {
"bool": {
"must": [
{ "match" : { "last_name" : "Smith" } }
],
"filter": [
{ "range" : { "age" : { "gt" : 10} } }
]
}
}
}
'
######Response
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.2876821,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests" : [
"music"
]
}
}
]
}
}
>* 分页 (Pagination)
The **from** parameter defines the **offset** from the first result you want to fetch. The **size** parameter allows you to configure the **maximum amount** of hits to be returned.
Though **from** and **size** can be set as request parameters, they can also be set within the search body. **from** defaults to 0, and **size** defaults to 10.
curl -XGET 'http://10.213.10.30:10920/megacorp/employee/_search?pretty' -d '
{
"from" : 0, "size" : 1,
"query": {
"bool": {
"must": [
{ "match" : { "last_name" : "Smith" } }
],
"filter": [
{ "range" : { "age" : { "gt" : 10} } }
]
}
}
}
'
######Response
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.2876821,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [
"sports",
"music"
]
}
}
]
}
}
>* 排序 (Sort)
The **from** parameter defines the **offset** from the first result you want to fetch. The **size** parameter allows you to configure the **maximum amount** of hits to be returned.
Though **from** and **size** can be set as request parameters, they can also be set within the search body. **from** defaults to 0, and **size** defaults to 10.
[查看更多用法](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html)
curl -XGET 'http://10.213.10.30:10920/megacorp/employee/_search?pretty' -d '
{
"sort" : [
{ "age" : {"order" : "asc"}},
"_score"
],
"query": {
"bool": {
"must": [
{ "match" : {
"last_name" : "Smith"
}}
],
"filter": [
{ "range" : {
"age" : { "gt" : 10 }
}
}
]
}
}
}
'
##全文搜索
搜索所有喜欢 rock climbing 的员工:
>
curl -XGET 'http://10.213.10.30:10920/megacorp/employee/_search?pretty' -d '
{
"query" : {
"match" : {
"about" : "rock climbing"
}
}
}
'
你会发现我们同样使用了 match 查询来搜索 about 字段中的 rock climbing。我们会得到两个匹配的文档:
######Response
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.53484553,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.53484553,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 0.26742277,
"_source" : {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests" : [
"music"
]
}
}
]
}
}
通常情况下,Elasticsearch 会通过相关性来排列顺序,第一个结果中,John Smith 的 about 字段中明确地写到 rock climbing。而在 Jane Smith 的 about 字段中,提及到了 rock,但是并没有提及到 climbing,所以后者的 _score 就要比前者的低。
这个例子很好地解释了 Elasticsearch 是如何执行全文搜索的。对于 Elasticsearch 来说,相关性的概念是很重要的,而这也是它与传统数据库在返回匹配数据时最大的不同之处。
* 段落搜索
能够找出每个字段中的独立单词固然很好,但是有的时候你可能还需要去匹配精确的短语或者 段落。例如,我们只需要查询到 about 字段只包含 rock climbing 的短语的员工。
为了实现这个效果,我们将对 match 查询变为 match_phrase 查询:
curl -XGET 'http://10.213.10.30:10920/megacorp/employee/_search?pretty' -d '
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
}
}
'
######Response
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.53484553,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.53484553,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [
"sports",
"music"
]
}
}
]
}
}
* 高亮我们的搜索
很多程序希望能在搜索结果中 高亮 匹配到的关键字来告诉用户这个文档是 如何 匹配他们的搜索的。在 Elasticsearch 中找到高亮片段是非常容易的。
让我们回到之前的查询,但是添加一个 highlight 参数:
curl -XGET 'http://10.213.10.30:10920/megacorp/employee/_search?pretty' -d '
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": {
"fields" : {
"about" : {}
}
}
}
'
当我们运行这个查询后,相同的命中结果会被返回,但是我们会得到一个新的名叫 highlight 的部分。在这里包含了 about 字段中的匹配单词,并且会被 <em></em> HTML字符包裹住:
######Response
{
"took" : 30,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.53484553,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.53484553,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [
"sports",
"music"
]
},
"highlight" : {
"about" : [
"I love to go <em>rock</em> <em>climbing</em>"
]
}
}
]
}
}
##统计
Elasticsearch 把这项功能称作 汇总 (aggregations),通过这个功能,我们可以针对你的数据进行复杂的统计。这个功能有些类似于 SQL 中的 GROUP BY,但是要比它更加强大。
例如,让我们找一下员工中最受欢迎的兴趣是什么:
curl -XGET 'http://10.213.10.30:10920/megacorp/employee/_search?pretty' -d '
{
"aggs": {
"all_interests": {
"terms": { "field": "interests" }
}
}
}
'
可能会出现如下错误:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "megacorp",
"node" : "qm6aUUoUScO_S16Sod_7Bw",
"reason" : {
"type" : "illegal_argument_exception",
"reason" : "Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
}
],
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
},
"status" : 400
}
解决这个问题需要在该字段上设置Fileddata=true,默认是禁用的;
[解决方案](https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html#_fielddata_is_disabled_on_literal_text_literal_fields_by_default)
curl -XPUT 'http://10.213.10.30:10920/megacorp/_mapping/employee?pretty' -d'
{
"properties": {
"interests": {
"type": "text",
"fielddata": true
}
}
}
'
#####Response:
{
"acknowledged" : true
}
然后,让我们重新找一下员工中最受欢迎的兴趣是什么:
curl -XGET 'http://10.213.10.30:10920/megacorp/employee/_search?pretty' -d '
{
"aggs": {
"all_interests": {
"terms": { "field": "interests" }
}
}
}
{
"took" : 19,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests" : [
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about" : "I like to build cabinets",
"interests" : [
"forestry"
]
}
}
]
},
"aggregations" : {
"all_interests" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "music",
"doc_count" : 2
},
{
"key" : "forestry",
"doc_count" : 1
},
{
"key" : "sports",
"doc_count" : 1
}
]
}
}
}
'
文档:
http://www.cnblogs.com/muniaofeiyu/p/5616316.html
http://blog.csdn.net/ty_0930/article/details/52266611