elasticsearch学习笔记(三)-elasticsear

2021-07-05  本文已影响0人  Shawn_Shawn

elasticsearch Search APIs

URL Search API

语法:

get <index_name>/_search

post <index_name>/_search
{

}

说明:

Query String Syntax

demo:

# 获取2012的电影
get movies/_search?q=2012&df=year&sort=year:desc&from=0&size=10&timeout=1s

Query Domain Specific Language(DSL)

举例:

# 查询2005年上映的电影
get movies/_search?q=year:2005

post movies/_search
{
  "query":{
    "match": {"year": 2005}
  }
}

Term-Level Queries

案例:

创建一个products的index,并插入3条数据

DELETE products
PUT products
{
  "settings": {
    "number_of_shards": 1
  }
}


POST /products/_bulk
{ "index": { "_id": 1 }}
{ "productID" : "XHDK-A-1293-#fJ3","desc":"iPhone" }
{ "index": { "_id": 2 }}
{ "productID" : "KDKE-B-9947-#kL5","desc":"iPad" }
{ "index": { "_id": 3 }}
{ "productID" : "JODL-X-1937-#pV7","desc":"MBP" }

Term Query

使用Term Query,查看desc的值是iPhone

POST /products/_search
{
  "query": {
    "term": {
      "desc": {
        "value":"iPhone"
      }
    }
  }
}

结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

思考:document里明明有desc的值是iPhone的,为什么查不到数据呢?

答案:

由于插入一条document的时候,会做分词处理,使用的是Standard Analyzer,默认会转成小写字母,但是使用Term Query的时候,输入不会做分词处理,所以大写的P不会转成小写的p。如果查询的值是iphone就能得到结果

POST /products/_search
{
  "query": {
    "term": {
      "desc": {
        "value":"iphone"
      }
    }
  }
}

Structured Search

Boolean

数据准备:

DELETE products
POST /products/_bulk
{ "index": { "_id": 1 }}
{ "price" : 10,"avaliable":true,"date":"2018-01-01", "productID" : "XHDK-A-1293-#fJ3" }
{ "index": { "_id": 2 }}
{ "price" : 20,"avaliable":true,"date":"2019-01-01", "productID" : "KDKE-B-9947-#kL5" }
{ "index": { "_id": 3 }}
{ "price" : 30,"avaliable":true, "productID" : "JODL-X-1937-#pV7" }
{ "index": { "_id": 4 }}
{ "price" : 30,"avaliable":false, "productID" : "QQPX-R-3956-#aD8" }

GET products/_mapping

案例:

#对布尔值 match 查询,有算分
POST products/_search
{
  "profile": "true",
  "explain": true,
  "query": {
    "term": {
      "avaliable": true
    }
  }
}

#对布尔值,通过constant score 转成 filtering,没有算分
POST products/_search
{
  "profile": "true",
  "explain": true,
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "avaliable": true
        }
      }
    }
  }
}
Numeric Range
#数字类型 Term
POST products/_search
{
  "profile": "true",
  "explain": true,
  "query": {
    "term": {
      "price": 30
    }
  }
}

#数字类型 terms
POST products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "terms": {
          "price": [
            "20",
            "30"
          ]
        }
      }
    }
  }
}

#数字 Range 查询
GET products/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "range" : {
                    "price" : {
                        "gte" : 20,
                        "lte"  : 30
                    }
                }
            }
        }
    }
}
Date Range
表达式 说明
y Years
M Months
w Weeks
d Days
h Hours
H Hours
m Minutes
s Seconds

假设now表示现在时间是2021-07-04 12:00:00

表达式 说明
now+1h 2021-07-04 13:00:00
now-1h 2021-07-04 11:00:00
2021.07.04||+1M/d 2021-08-04 00:00:00

案列:

POST products/_search{    "query" : {        "constant_score" : {            "filter" : {                "range" : {                    "date" : {                      "gte" : "now-5y"                    }                }            }        }    }}
Exists

如下情况,调用exists方法时不会返回结果

POST products/_search{  "query": {    "constant_score": {      "filter": {        "exists": {          "field": "date"        }      }    }  }}POST products/_search{  "query": {    "constant_score": {      "filter": {        "bool": {          "must_not": {            "exists": {              "field": "date"            }          }        }      }    }  }}
Terms

查找包含多个精确值,注意包含而不是相等

PUT my-index-000001{  "mappings": {    "properties": {      "color": { "type": "keyword" }    }  }}PUT my-index-000001/_bulk{"index": {"_id": 1}}{"color": ["blue", "green"]}{"index": {"_id": 2}}{"color": "blue"}GET my-index-000001/_search?pretty{  "query": {    "terms": {        "color" : {            "index" : "my-index-000001",            "id" : "2",            "path" : "color"        }    }  }}POST movies/_search{  "query": {    "constant_score": {      "filter": {        "term": {          "genre.keyword": "Comedy"        }      }    }  }}POST products/_search{  "query": {    "constant_score": {      "filter": {        "terms": {          "productID.keyword": [            "QQPX-R-3956-#aD8",            "JODL-X-1937-#pV7"          ]        }      }    }  }}

Full Text Query

Query String Query

类似[URL Search](#URL Search API)

Simple Query String Query

GET /movies/_search{    "profile":true, "query":{       "simple_query_string":{         "query":"Beautiful +mind",          "fields":["title"]      }   }}

Match Query

# 查看title里包含Beautiful OR Mind的电影POST movies/_search{  "query": {    "match": {      "title": {        "query": "Beautiful Mind"      }    }  }}# 查看title里包含Beautiful AND Mind的电影POST movies/_search{  "query": {    "match": {      "title": {        "query": "Beautiful Mind",        "operator": "AND"      }    }  }}

Match Phrase Query

与Match Query不同的是,不会对查询的text进行分词,还是作为一个完整的短语。

POST movies/_search{  "query": {    "match_phrase": {      "title":{        "query": "one I love"      }    }  }}POST movies/_search{  "query": {    "match_phrase": {      "title":{        "query": "one love",        "slop": 1      }    }  }}

这种精确匹配在大部分情况下显得太严苛了,有时我们想要包含 ""I like swimming and riding!"" 的文档也能够匹配 "I like riding"。这时就要以用到 "slop" 参数来控制查询语句的灵活度。

slop 参数告诉 match_phrase 查询词条相隔多远时仍然能将文档视为匹配 什么是相隔多远? 意思是说为了让查询和文档匹配你需要移动词条多少次?

Multi Match Query

multi_match 查询建立在 match 查询之上,重要的是它允许对多个字段查询。

类型 说明 备注
Best Fields 查找匹配任何字段的文档,但使用来自最佳字段的 _score 当字段之间相互竞争,又相互关联。评分来自最匹配的字段。
Most Fields 多个字段都包含相同的文本的场合,会将所有字段的评分合并起来 处理英文内容时:一种常见的手段是,在主字段(Engilsh Analyzer),抽取词干,以匹配更多的文档。相同的文本,加入子字段(Standard Analyzer),以提供更加精确的匹配。其他字段作为匹配文档提高相关度的信号。匹配字段越多则越好。<br />无法使用Operator<br />可以用copy_to解决,但需要额外的存储空间
Cross Fields 首先分析查询字符串并生成一个词列表,然后从所有字段中依次搜索每个词,只要查询到,就算匹配上。 对于某些实体,例如人名,地址,图书信息。需要在多个字段中确定信息,单个字段只能作为整体的一部分。希望在任何这些列出的字段中找到尽可能多的词。<br />支持operator<br />与copy_to相比,它可以在搜索时为单个字段提升权重
phrase 同match_phrase + best_field
phrase_prefix 同match_phrase_prefix + best_field
bool_prefix 同match_bool_prefix + most field
POST blogs/_search{    "query": {        "dis_max": {            "queries": [                { "match": { "title": "Quick pets" }},                { "match": { "body":  "Quick pets" }}            ],            "tie_breaker": 0.2        }    }}POST blogs/_search{  "query": {    "multi_match": {      "type": "best_fields",      "query": "Quick pets",      "fields": ["title","body"],      "tie_breaker": 0.2,      "minimum_should_match": "20%"    }  }}POST books/_search{    "multi_match": {        "query":  "Quick brown fox",        "fields": "*_title"    }}POST books/_search{    "multi_match": {        "query":  "Quick brown fox",        "fields": [ "*_title", "chapter_title^2" ]    }}DELETE /titlesPUT /titles{  "mappings": {    "properties": {      "title": {        "type": "text",        "analyzer": "english",        "fields": {"std": {"type": "text","analyzer": "standard"}}      }    }  }}POST titles/_bulk{ "index": { "_id": 1 }}{ "title": "My dog barks" }{ "index": { "_id": 2 }}{ "title": "I see a lot of barking dogs on the road " }GET /titles/_search{   "query": {        "multi_match": {            "query":  "barking dogs",            "type":   "most_fields",            "fields": [ "title", "title.std" ]        }    }}GET /titles/_search{   "query": {        "multi_match": {            "query":  "barking dogs",            "type":   "most_fields",            "fields": [ "title^10", "title.std" ]        }    }}

Compound queries

Query Context & Filter Context

Boolean Query

案例:

特点:

语法:

POST /products/_search{  "query": {    "bool" : {      "must" : {        "term" : { "price" : "30" }      },      "filter": {        "term" : { "avaliable" : "true" }      },      "must_not" : {        "range" : {          "price" : { "lte" : 10 }        }      },      "should" : [        { "term" : { "productID.keyword" : "JODL-X-1937-#pV7" } },        { "term" : { "productID.keyword" : "XHDK-A-1293-#fJ3" } }      ],      "minimum_should_match" :1    }  }}

如何解决Terms Query遗留下来的问题,包含而不是相等。

增加count字段,使用boolean query解决

#改变数据模型,增加字段。解决数组包含而不是精确匹配的问题POST /newmovies/_bulk{ "index": { "_id": 1 }}{ "title" : "Father of the Bridge Part II","year":1995, "genre":"Comedy","genre_count":1 }{ "index": { "_id": 2 }}{ "title" : "Dave","year":1993,"genre":["Comedy","Romance"],"genre_count":2 }#must,有算分POST /newmovies/_search{  "query": {    "bool": {      "must": [        {"term": {"genre.keyword": {"value": "Comedy"}}},        {"term": {"genre_count": {"value": 1}}}      ]    }  }}#Filter。不参与算分,结果的score是0POST /newmovies/_search{  "query": {    "bool": {      "filter": [        {"term": {"genre.keyword": {"value": "Comedy"}}},        {"term": {"genre_count": {"value": 1}}}        ]    }  }}#Query ContextPOST /products/_search{  "query": {    "bool": {      "should": [        {          "term": {            "productID.keyword": {              "value": "JODL-X-1937-#pV7"}}        },        {"term": {"avaliable": {"value": true}}        }      ]    }  }}#嵌套,实现了 should not 逻辑POST /products/_search{  "query": {    "bool": {      "must": {        "term": {          "price": "30"        }      },      "should": [        {          "bool": {            "must_not": {              "term": {                "avaliable": "false"              }            }          }        }      ],      "minimum_should_match": 1    }  }}#Controll the PrecisionPOST _search{  "query": {    "bool" : {      "must" : {        "term" : { "price" : "30" }      },      "filter": {        "term" : { "avaliable" : "true" }      },      "must_not" : {        "range" : {          "price" : { "lte" : 10 }        }      },      "should" : [        { "term" : { "productID.keyword" : "JODL-X-1937-#pV7" } },        { "term" : { "productID.keyword" : "XHDK-A-1293-#fJ3" } }      ],      "minimum_should_match" :2    }  }}

Boosting Query

案例:

DELETE blogsPOST /blogs/_bulk{ "index": { "_id": 1 }}{"title":"Apple iPad", "content":"Apple iPad,Apple iPad" }{ "index": { "_id": 2 }}{"title":"Apple iPad,Apple iPad", "content":"Apple iPad" }POST blogs/_search{  "query": {    "bool": {      "should": [        {"match": {          "title": {            "query": "apple,ipad",            "boost": 1.1          }        }},        {"match": {          "content": {            "query": "apple,ipad",            "boost": 2          }        }}      ]    }  }}DELETE newsPOST /news/_bulk{ "index": { "_id": 1 }}{ "content":"Apple Mac" }{ "index": { "_id": 2 }}{ "content":"Apple iPad" }{ "index": { "_id": 3 }}{ "content":"Apple employee like Apple Pie and Apple Juice" }POST news/_search{  "query": {    "bool": {      "must": {        "match":{"content":"apple"}      }    }  }}POST news/_search{  "query": {    "bool": {      "must": {        "match":{"content":"apple"}      },      "must_not": {        "match":{"content":"pie"}      }    }  }}POST news/_search{  "query": {    "boosting": {      "positive": {        "match": {          "content": "apple"        }      },      "negative": {        "match": {          "content": "pie"        }      },      "negative_boost": 0.5    }  }}

Constant Score Query

Disjunction Max Query

单字符串查询的实例

PUT /blogs/_doc/1{    "title": "Quick brown rabbits",    "body":  "Brown rabbits are commonly seen."}PUT /blogs/_doc/2{    "title": "Keeping pets healthy",    "body":  "My quick brown fox eats rabbits on a regular basis."}POST /blogs/_search{    "query": {        "bool": {            "should": [                { "match": { "title": "Brown fox" }},                { "match": { "body":  "Brown fox" }}            ]        }    }}

预期:

title:文档1中出现了Brown

body:文档1中出现了Brown,文档2中出现了Brown fox,并且保持和查询一致的顺序,目测应该是文档2的相关性算分最高。

结果:

文档1的算分比文档2的高。

{  "took" : 0,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 2,      "relation" : "eq"    },    "max_score" : 0.90425634,    "hits" : [      {        "_index" : "blogs",        "_type" : "_doc",        "_id" : "1",        "_score" : 0.90425634,        "_source" : {          "title" : "Quick brown rabbits",          "body" : "Brown rabbits are commonly seen."        }      },      {        "_index" : "blogs",        "_type" : "_doc",        "_id" : "2",        "_score" : 0.77041256,        "_source" : {          "title" : "Keeping pets healthy",          "body" : "My quick brown fox eats rabbits on a regular basis."        }      }    ]  }}

算分过程:

可以使用explain看一下查询结果和分析

title和body相互竞争,不应该将分数简单叠加,而是应该找到单个最佳匹配的字段的评分。Disjunction Max Query将任何与任意查询匹配的文档作为结果返回。采用字段上最匹配的评分最终评分返回。

POST blogs/_search{    "query": {        "dis_max": {            "queries": [                { "match": { "title": "Brown fox" }},                { "match": { "body":  "Brown fox" }}            ]        }    }}

这样返回的结果就会符合预期。

tie_breaker参数:

Function Score Query

算分与排序

Function Score Query

DELETE blogsPUT /blogs/_doc/1{  "title":   "About popularity",  "content": "In this post we will talk about...",  "votes":   0}PUT /blogs/_doc/2{  "title":   "About popularity",  "content": "In this post we will talk about...",  "votes":   100}PUT /blogs/_doc/3{  "title":   "About popularity",  "content": "In this post we will talk about...",  "votes":   1000000}POST /blogs/_search{  "query": {    "function_score": {      "query": {        "multi_match": {          "query":    "popularity",          "fields": [ "title", "content" ]        }      },      "field_value_factor": {        "field": "votes"      }    }  }}POST /blogs/_search{  "query": {    "function_score": {      "query": {        "multi_match": {          "query":    "popularity",          "fields": [ "title", "content" ]        }      },      "field_value_factor": {        "field": "votes",        "modifier": "log1p"      }    }  }}POST /blogs/_search{  "query": {    "function_score": {      "query": {        "multi_match": {          "query":    "popularity",          "fields": [ "title", "content" ]        }      },      "field_value_factor": {        "field": "votes",        "modifier": "log1p" ,        "factor": 0.1      }    }  }}POST /blogs/_search{  "query": {    "function_score": {      "query": {        "multi_match": {          "query":    "popularity",          "fields": [ "title", "content" ]        }      },      "field_value_factor": {        "field": "votes",        "modifier": "log1p" ,        "factor": 0.1      },      "boost_mode": "sum",      "max_boost": 3    }  }}POST /blogs/_search{  "query": {    "function_score": {      "random_score": {        "seed": 911119      }    }  }}

Search Template

GET _search/template{  "source" : {    "query": { "match" : { "{{my_field}}" : "{{my_value}}" } },    "size" : "{{my_size}}"  },  "params" : {    "my_field" : "message",    "my_value" : "foo",    "my_size" : 5  }}

Suggester API

Term Suggester && Prase Suggester

Term Suggester 先将搜索词进行分词,然后逐个与指定的索引数据进行比较,计算出编辑距离再返回建议词。

编辑距离:这里使用了叫做Levenstein edit distance的算法,核心思想就是一个词改动多少次就可以和另外的词一致。比如说为了从elasticseach得到elasticsearch,就必须加入1个字母 r ,也就是改动1次,所以这两个词的编辑距离就是1。

Prase Suggester在Term Suggester上增加了一些逻辑

Prase Suggester常用参数里max errors:最多可以拼错的Terms数,confidence:限制返回结果数,默认为1

DELETE articlesPUT articles{  "mappings": {    "properties": {      "title_completion":{        "type": "completion"      }    }  }}POST articles/_bulk{ "index" : { } }{ "title_completion": "lucene is very cool"}{ "index" : { } }{ "title_completion": "Elasticsearch builds on top of lucene"}{ "index" : { } }{ "title_completion": "Elasticsearch rocks"}{ "index" : { } }{ "title_completion": "elastic is the company behind ELK stack"}{ "index" : { } }{ "title_completion": "Elk stack rocks"}{ "index" : {} }POST articles/_search?pretty{  "size": 0,  "suggest": {    "article-suggester": {      "prefix": "elk ",      "completion": {        "field": "title_completion"      }    }  }}DELETE articlesPOST articles/_bulk{ "index" : { } }{ "body": "lucene is very cool"}{ "index" : { } }{ "body": "Elasticsearch builds on top of lucene"}{ "index" : { } }{ "body": "Elasticsearch rocks"}{ "index" : { } }{ "body": "elastic is the company behind ELK stack"}{ "index" : { } }{ "body": "Elk stack rocks"}{ "index" : {} }{  "body": "elasticsearch is rock solid"}POST _analyze{  "analyzer": "standard",  "text": ["Elk stack  rocks rock"]}POST /articles/_search{  "size": 1,  "query": {    "match": {      "body": "lucen rock"    }  },  "suggest": {    "term-suggestion": {      "text": "lucen rock",      "term": {        "suggest_mode": "missing",        "field": "body"      }    }  }}POST /articles/_search{  "suggest": {    "term-suggestion": {      "text": "lucen rock",      "term": {        "suggest_mode": "popular",        "field": "body"      }    }  }}POST /articles/_search{  "suggest": {    "term-suggestion": {      "text": "lucen rock",      "term": {        "suggest_mode": "always",        "field": "body",      }    }  }}POST /articles/_search{  "suggest": {    "term-suggestion": {      "text": "lucen hocks",      "term": {        "suggest_mode": "always",        "field": "body",        "prefix_length":0,        "sort": "frequency"      }    }  }}POST /articles/_search{  "suggest": {    "my-suggestion": {      "text": "lucne and elasticsear rock hello world ",      "phrase": {        "field": "body",        "max_errors":2,        "confidence":0,        "direct_generator":[{          "field":"body",          "suggest_mode":"always"        }],        "highlight": {          "pre_tag": "<em>",          "post_tag": "</em>"        }      }    }  }}

Complection Suggester

context Suggester

DELETE articlesPUT articles{  "mappings": {    "properties": {      "title_completion":{        "type": "completion"      }    }  }}POST articles/_bulk{ "index" : { } }{ "title_completion": "lucene is very cool"}{ "index" : { } }{ "title_completion": "Elasticsearch builds on top of lucene"}{ "index" : { } }{ "title_completion": "Elasticsearch rocks"}{ "index" : { } }{ "title_completion": "elastic is the company behind ELK stack"}{ "index" : { } }{ "title_completion": "Elk stack rocks"}{ "index" : {} }POST articles/_search?pretty{  "size": 0,  "suggest": {    "article-suggester": {      "prefix": "elk ",      "completion": {        "field": "title_completion"      }    }  }}DELETE commentsPUT commentsPUT comments/_mapping{  "properties": {    "comment_autocomplete":{      "type": "completion",      "contexts":[{        "type":"category",        "name":"comment_category"      }]    }  }}POST comments/_doc{  "comment":"I love the star war movies",  "comment_autocomplete":{    "input":["star wars"],    "contexts":{      "comment_category":"movies"    }  }}POST comments/_doc{  "comment":"Where can I find a Starbucks",  "comment_autocomplete":{    "input":["starbucks"],    "contexts":{      "comment_category":"coffee"    }  }}POST comments/_search{  "suggest": {    "MY_SUGGESTION": {      "prefix": "sta",      "completion":{        "field":"comment_autocomplete",        "contexts":{          "comment_category":"coffee"        }      }    }  }}

Cross Cluster Search

水平扩展的痛点:

Cross Cluster Search

案例:

//启动3个集群bin/elasticsearch -E node.name=cluster0node -E cluster.name=cluster0 -E path.data=cluster0_data -E discovery.type=single-node -E http.port=9200 -E transport.port=9300bin/elasticsearch -E node.name=cluster1node -E cluster.name=cluster1 -E path.data=cluster1_data -E discovery.type=single-node -E http.port=9201 -E transport.port=9301bin/elasticsearch -E node.name=cluster2node -E cluster.name=cluster2 -E path.data=cluster2_data -E discovery.type=single-node -E http.port=9202 -E transport.port=9302//在每个集群上设置动态的设置PUT _cluster/settings{  "persistent": {    "cluster": {      "remote": {        "cluster0": {          "seeds": [            "127.0.0.1:9300"          ],          "transport.ping_schedule": "30s"        },        "cluster1": {          "seeds": [            "127.0.0.1:9301"          ],          "transport.compress": true,          "skip_unavailable": true        },        "cluster2": {          "seeds": [            "127.0.0.1:9302"          ]        }      }    }  }}#cURLcurl -XPUT "http://localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'{"persistent":{"cluster":{"remote":{"cluster0":{"seeds":["127.0.0.1:9300"],"transport.ping_schedule":"30s"},"cluster1":{"seeds":["127.0.0.1:9301"],"transport.compress":true,"skip_unavailable":true},"cluster2":{"seeds":["127.0.0.1:9302"]}}}}}'curl -XPUT "http://localhost:9201/_cluster/settings" -H 'Content-Type: application/json' -d'{"persistent":{"cluster":{"remote":{"cluster0":{"seeds":["127.0.0.1:9300"],"transport.ping_schedule":"30s"},"cluster1":{"seeds":["127.0.0.1:9301"],"transport.compress":true,"skip_unavailable":true},"cluster2":{"seeds":["127.0.0.1:9302"]}}}}}'curl -XPUT "http://localhost:9202/_cluster/settings" -H 'Content-Type: application/json' -d'{"persistent":{"cluster":{"remote":{"cluster0":{"seeds":["127.0.0.1:9300"],"transport.ping_schedule":"30s"},"cluster1":{"seeds":["127.0.0.1:9301"],"transport.compress":true,"skip_unavailable":true},"cluster2":{"seeds":["127.0.0.1:9302"]}}}}}'#创建测试数据curl -XPOST "http://localhost:9200/users/_doc" -H 'Content-Type: application/json' -d'{"name":"user1","age":10}'curl -XPOST "http://localhost:9201/users/_doc" -H 'Content-Type: application/json' -d'{"name":"user2","age":20}'curl -XPOST "http://localhost:9202/users/_doc" -H 'Content-Type: application/json' -d'{"name":"user3","age":30}'#查询GET /users,cluster1:users,cluster2:users/_search{  "query": {    "range": {      "age": {        "gte": 20,        "lte": 40      }    }  }}

resources

REST APIs

Search APIs

Query DSL

上一篇 下一篇

猜你喜欢

热点阅读