七、Elasticsearch手动控制全文检索结果的精准度
1、数据准备
POST /forum/article/_bulk
{ "update": { "_id": "1"} }
{ "doc" : {"title" : "this is java and elasticsearch blog"} }
{ "update": { "_id": "2"} }
{ "doc" : {"title" : "this is java blog"} }
{ "update": { "_id": "3"} }
{ "doc" : {"title" : "this is elasticsearch blog"} }
{ "update": { "_id": "4"} }
{ "doc" : {"title" : "this is java, elasticsearch, hadoop blog"} }
{ "update": { "_id": "5"} }
{ "doc" : {"title" : "this is spark blog"} }
2、搜索标题中包含java或包含Elasticsearch的document
SQL:
select * from tab where title like 'java' or title like 'Elasticsearch'
ES:
GET /forum/article/_search
{
"query": {
"match": {
"title": "java Elasticsearch"
}
}
}
这个就跟以前的那个term query不一样了,不是搜索exact value,是进行full text全文检索,match query,是负责进行全文检索的,当然,如果要检索的field是not_analyzed类型的,那么match query也相当于term query
3、搜索标题中包含java和Elasticsearch的document
要求title中既包含java也包含Elasticsearch
GET /forum/article/_search
{
"query": {
"match": {
"title": {
"query": "java elasticsearch",
"operator": "and"
}
}
}
}
operator:支持and和or,and是并且,or是或者。用or的话和需求2的结果是一样的。
4、搜索包含java,Elasticsearch,spark,hadoop,4个关键字中,至少包含三个的document
GET /forum/article/_search
{
"query": {
"match": {
"title": {
"query": "java elasticsearch spark hadoop",
"minimum_should_match" : "75%"
}
}
}
}
minimun_should_match:75%,意思是说四个关键字中匹配75%,也就是4分之3。也就是说至少匹配三个关键字。
返回结果
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.3375794,
"hits": [
{
"_index": "forum",
"_type": "article",
"_id": "4",
"_score": 1.3375794,
"_source": {
"articleID": "QQPX-R-3956-#aD8",
"userID": 2,
"hidden": true,
"postDate": "2017-01-02",
"tag": [
"java",
"elasticsearch"
],
"tag_cnt": 2,
"view_cnt": 80,
"title": "this is java, elasticsearch, hadoop blog"
}
}
]
}
}
只有一条document匹配
5、搜索必须包含java,必须不包含spark,包含不包含Hadoop和Elasticsearch都行的document
GET /forum/article/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "java"
}
}
],
"must_not": [
{
"match": {
"title": "spark"
}
}
],
"should": [
{
"match": {
"title": "hadoop"
}
},
{
"match": {
"title": "elasticsearch"
}
}
]
}
}
}
6、用bool来搜索java,Hadoop,spark,Elasticsearch,至少包含其中的三个关键字
默认情况下,should是可以不匹配任何一个的,但是有个例外,就是如果没有must的话,那么should中必须至少匹配一个才可以。
但是可以精准控制should的几个条件中,至少匹配几个才能作为结果返回。
GET /forum/article/_search
{
"query": {
"bool": {
"should": [
{ "match": { "title": "java" }},
{ "match": { "title": "elasticsearch" }},
{ "match": { "title": "hadoop" }},
{ "match": { "title": "spark" }}
],
"minimum_should_match": 3
}
}
}
7、梳理下学习到的知识点
1、全文检索的时候,进行多个值的检索,有两种做法,match query和should
2、控制搜素结果精准度:operator:【and or】,minimum_should_match
若有兴趣,欢迎来加入群,【Java初学者学习交流群】:458430385,此群有Java开发人员、UI设计人员和前端工程师。有问必答,共同探讨学习,一起进步!
欢迎关注我的微信公众号【Java码农社区】,会定时推送各种干货:
qrcode_for_gh_577b64e73701_258.jpg