Elasticsearch

使用elasticsearch索引文件并搜索

2018-09-20  本文已影响16人  LI木水

1.elasticsearch索引文件需要一个插件

es版本 插件名 参考文档
es5.0之前 mapper-attachments https://qbox.io/blog/index-attachments-files-elasticsearch-mapper
es5.0以后 ingest-attachment https://qbox.io/blog/how-to-index-attachments-and-files-to-elasticsearch-5-0-using-ingest-apihttps://www.elastic.co/guide/en/elasticsearch/plugins/5.6/using-ingest-attachment.html

由于原本的es集群是2.3.5版本的,先试了安装2.3.5版本的 mapper-attachments安装失败,原因是下载下来的插件版本说是匹配2.0的ES。好像es集群是2.4的时候可以安装成功,请自己测试。又想把ES版本升级到5.x,于是选择了5.6的ES版本。

2.插件安装

bin/elasticsearch-plugin install ingest-attachment

3.创建一个attachment pipeline

注:properties的字段可以指定,最多可指定"content", "title", "author", "keywords", "date", "content_length", "content_type"

curl -XPUT 'http://localhost:19200/_ingest/pipeline/attachment?pretty' -H 'Content-Type: application/json' -d '{
 "description" : "Extract attachment information encoded in Base64 with UTF-8 charset",
 "processors" : [
   {
     "attachment" : {
       "field" : "data",
       "properties": [ "content", "title", "author", "keywords", "date", "content_length", "content_type" ]
     }
   }
 ]
}'

4.创建索引test

curl -XPUT 'http://localhost:19200/test/' -d '{
  "settings":{
      "index":{
          "number_of_shards":1,
          "number_of_replicas":1
      }
  }
}'

5.创建mapping

curl -XPUT 'http://localhost:19200/test/_mapping/document/' -d '
{
  "document": {
    "_source": {
      "excludes": [
        "data",
        "attachment.content"
      ]
    },
    "properties": {
      "filename": {
        "type": "text"
      },
      "attachment": {
        "properties": {
          "date": {
            "type": "date"
          },
          "content_type": {
            "type": "text",
            "fields": {
              "keyword": {
                "ignore_above": 256,
                "type": "keyword"
              }
            }
          },
          "author": {
            "type": "text",
            "fields": {
              "keyword": {
                "ignore_above": 256,
                "type": "keyword"
              }
            }
          },
          "title": {
            "type": "text",
            "fields": {
              "keyword": {
                "ignore_above": 256,
                "type": "keyword"
              }
            }
          },
          "content": {
            "type": "text"
          },
          "content_length": {
            "type": "long"
          }
        }
      },
      "data": {
        "type": "binary",
        "store": false
      },
      "filePath": {
        "type": "keyword"
      },
      "downloadTimes": {
        "type": "long"
      },
      "source": {
        "type": "keyword"
      },
      "type": {
        "type": "keyword"
      },
      "uploadTime": {
        "type": "date"
      },
      "viewTimes": {
        "type": "long"
      },
      "fileType": {
        "type": "keyword"
      }
    }
  }
}'

说明:1.为了只索引而不存储content字段,否则文件过大查询一次要把内容都拿出来,需要在source中排除掉,只写store:false是没用的。

参考:http://blog.csdn.net/napoay/article/details/62233031

    "_source": {
      "excludes": [
        "data",
        "attachment.content"
      ]
    },

type:"keyword",完全匹配搜索

 "source": {
        "type": "keyword"
      }

ES5之后去掉了string类型,改为text

          "content": {
            "type": "text"
          }

data 是原文档的base64编码,存储为binary,不需要被看到,也排除在_source中

      "data": {
        "type": "binary",
        "store": false
      }

6.索引数据

注:data 是原文档的base64编码,用java api索引的时候要把文件内容读为base64字符串放入data字段

curl -XPUT 'http://localhost:19200/test/document/test_id2?pipeline=attachment&pretty' -H 'Content-Type: application/json' -d '{
"source":"测试",
"filename":"测试文档",
 "data": "UWJveCBlbmFibGVzIGxhdW5jaGluZyBzdXBwb3J0ZWQsIGZ1bGx5LW1hbmFnZWQsIFJFU1RmdWwgRWxhc3RpY3NlYXJjaCBTZXJ2aWNlIGluc3RhbnRseS4g"
}'

7.查询

curl -XPOST 'http://localhost:19200/test/document/_search?pretty' -d '{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "attachment.content": "Qbox"
          }
        },
        {
          "term": {
            "source": "测试"
          }
        }
      ]
    }
  }
}'

其他参考:

https://www.elastic.co/guide/en/elasticsearch/plugins/5.6/using-ingest-attachment.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/binary.html

上一篇下一篇

猜你喜欢

热点阅读