十九、Elasticsearch基于slop参数实现近似匹配
2017-07-17 本文已影响64人
编程界的小学生
1、基本语法
GET forum/article/_search
{
"query": {
"match_phrase": {
"title": {
"query": "java spark",
"slop" : 1
}
}
}
}
2、slop的含义
query string,搜索文本中的几个term,要经过几次移动才能与一个document匹配,这个移动的次数,就是slop
3、slop举例
一个query string经过几次移动之后可以匹配到一个document,然后设置slop
doc:hello world, java is very good, spark is also very good.
用match phrase去搜索java spark,是搜不到的
如果我们指定了slop,那么久允许java spark进行移动,来尝试与doc进行匹配。
java | is | very | good | spark | is | |
---|---|---|---|---|---|---|
java | spark | |||||
java | --》 | spark | ||||
java | --》 | --》 | spark | |||
java | --》 | --》 | --》 | spark |
从表格中可以发现,我第一次移动了1位,spark到了very的位置,移动了三次后,恰巧到了对应的spark位置。所以这里slop就是3。因为java spark这个短语,spark移动了3次,就可以跟一个doc匹配上了。
slop的含义,不仅仅是说一个query string terms移动几次,跟一个doc匹配上,一个query string terms,最多可以移动几次去尝试跟一个doc匹配上。这里slop设置大于等于3就ok。
直接match_phrase搜索肯定是搜不到了,那么怎么才能搜到呢?
GET /forum/article/_search
{
"query": {
"match_phrase": {
"title": {
"query": "java spark",
"slop": 3
}
}
}
}
指定slop为大于等于3的数字就行了。原因我们已经在表格中体现了。
4、slop搜索下,关键词离得越近,relevance score分数就越高
GET /forum/article/_search
{
"query": {
"match_phrase": {
"content": {
"query": "java best",
"slop": 15
}
}
}
}
结果:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.65380025,
"hits": [
{
"_index": "forum",
"_type": "article",
"_id": "2",
"_score": 0.65380025,
"_source": {
"articleID": "KDKE-B-9947-#kL5",
"userID": 1,
"hidden": false,
"postDate": "2017-01-02",
"tag": [
"java"
],
"tag_cnt": 1,
"view_cnt": 50,
"title": "this is java blog",
"content": "i think java is the best programming language",
"sub_title": "learned a lot of course",
"author_first_name": "Smith",
"author_last_name": "Williams",
"new_author_last_name": "Williams",
"new_author_first_name": "Smith"
}
},
{
"_index": "forum",
"_type": "article",
"_id": "5",
"_score": 0.07111243,
"_source": {
"articleID": "DHJK-B-1395-#Ky5",
"userID": 3,
"hidden": false,
"postDate": "2017-03-01",
"tag": [
"elasticsearch"
],
"tag_cnt": 1,
"view_cnt": 10,
"title": "this is spark blog",
"content": "spark is best big data solution based on scala ,an programming language similar to java spark",
"sub_title": "haha, hello world",
"author_first_name": "Tonny",
"author_last_name": "Peter Smith",
"new_author_last_name": "Peter Smith",
"new_author_first_name": "Tonny"
}
}
]
}
}
若有兴趣,欢迎来加入群,【Java初学者学习交流群】:458430385,此群有Java开发人员、UI设计人员和前端工程师。有问必答,共同探讨学习,一起进步!
欢迎关注我的微信公众号【Java码农社区】,会定时推送各种干货:
qrcode_for_gh_577b64e73701_258.jpg