全文搜索引擎Elasticsearch

2018-12-25 本文已影响8人意识流丶

Elasticsearch简介

Elasticsearch是一个基于Apache Lucene(TM)的开源、分布式、RESTful接口的全文搜索引擎。Lucene可以被认为是迄今为止最先进、性能最好的、功能最全的搜索引擎库。使用Java开发并使用Lucene作为其核心来实现所有索引和搜索的功能，Elasticsearch通过简单的RESTful API来隐藏Lucene的复杂性，从而让全文搜索变得简单。
Github地址：https://github.com/elastic/elasticsearch
官方网站：https://www.elastic.co/products/elasticsearch
英文文档：https://www.elastic.co/guide/en/elasticsearch/reference/index.html
中文社区：https://elasticsearch.cn/

Elasticsearch特点：

1.速度快：可以在极短的时间内存储、搜索和分析大量的数据。通常作为具有复杂搜索场景情况下的核心发动机。
2.可拓展性强：Elasticsearch还是一个分布式文档数据库，其中每个字段均可被索引，而且每个字段的数据均可被搜索，Elasticsearch能够横向扩展至数以百计的服务器存储以及处理PB级的数据。
3.弹性，高度可用：Elasticsearch检测到故障，以确保您的群集（和您的数据）安全可用。通过跨群集复制，辅助群集可以作为热备份运行。
4.灵活性：适用于多种数据类型

Elasticsearch下载与安装

下载地址：https://www.elastic.co/downloads/elasticsearch
根据自己的操作系统进行选择

image.png

下载完解压即可
在bin目录中双击elasticsearch.bat即可

image.png

Elasticsearch启动监听两个端口，9300和9200
9300端口是使用tcp客户端连接使用的端口
9200端口是通过http协议连接Elasticsearch使用的端口
读取的配置文件是config目录下的elasticsearch.yml

image.png

建议注册成系统服务，执行命令elasticsearch-service.bat install

配置文件目录config

image.png

elasticsearch.yml 主配置文件
jvm.options jvm参数配置文件
log4j2.properties 日志配置文件

主配置文件`elasticsearch.yml`内容

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

Cluster设置集群的

cluster.name集群的名称，用于区分不同的集群，系统默认为elasitcsearch

Node主要是设置节点

node.name节点名称
node.attr.rack指定节点的部落属性，这是一个比集群更大的范围
node.master是否允许作为主节点，默认值为true
node.data是否存储数据，即存储索引片段，默认值为true

Paths关于数据和日志的存放路径的，这两个设置十分重要，因为比如要进行版本升级，如果程序与数据分离，将非常容易实现。程序的崩溃也不影响数据。

path.data数据存储位置
path.logs日志文件的路径
path.work临时文件位置
path.plugins插件安装位置

Memory内存设置

bootstrap.memory_lock启动后是否锁定内存，提高Elasticsearch的性能

Network有关网络的设置，比如RESTful接口，包括curl、浏览器、Kibana等HTTP连接过来的，都是通过这里设置

network.host设置绑定的ip地址,可以是ipv4或ipv6的,默认本地回环
http.port设置对外服务的http端口,默认为9200

Discovery设置集群的节点之间的连接的

discovery.zen.ping.unicast.hosts设置集群内节点的主机
discovery.zen.minimum_master_nodes设置一个集群中主节点的数量该值可在2-4之间
discovery.zen.ping.timeout设置ping其他节点时的超时时间
discovery.zen.ping.multicast.enabled禁止当前节点发现多个集群节点，默认值为true

Gateway网关，支持多种类型的Gateway，有本地文件系统（默认），分布式文件系统，Hadoop的HDFS和amazon的s3云存储服务。

gateway.recover_after_nodes设置集群中N个节点启动时进行数据恢复，默认为1
gateway.recover_after_time设置初始化恢复过程的超时时间
gateway.expected_nodes设置该集群中可存在的节点上限
gateway.type网关允许在所有集群重启后持有集群状态，集群状态的变更都会被保存下来，当第一次启用集群时，可以从网关中读取到状态，默认网关类型（也是推荐的）是local

Various其他

action.destructive_requires_name 在删除索引时，是否需要明确指定名称，该值为false时，将可以通过正则或_all进行索引删除
node.max_local_storage_nodes在一个系统上禁用启动多个节点

详细可以参考官方说明：https://www.elastic.co/guide/en/elasticsearch/reference/current/modules.html

启动完在浏览器中输入http://localhost:9200/

image.png

看到Elasticsearch的版本信息即启动成功

Elasticsearch的可视化界面

插件模式：推荐安装Elasticsearch的head插件

前提:安装好Node.js和配置好Npm，可以参考：https://www.jianshu.com/p/96f2f01a4f3e
插件前端代码的Github地址：https://github.com/mobz/elasticsearch-head
1.把代码从Github拉到本地
2.用npm对elasticsearch-head进行打包：npm install
3.安装grunt包：npm install –g grunt–cli

image.png

如果发现grunt不是内部命令的，在环境变量path中加入node_global文件夹的路径

image.png

4.在Elasticsearch配置文件elasticsearch.yml中加入

http.cors.enabled: true
http.cors.allow-origin: "*"

5.用grunt命令启动elasticsearch-head插件：grunt server

image.png

在浏览器访问http://localhost:9100/

image.png

注：如果显示没连接可以点击连接

官方工具：Elasticsearch常用可视化管理工具kibana

简介：

Kibana是一个为Elasticsearch平台分析和可视化的开源平台

特点：

1.通过 Kibana，能够对 Elasticsearch 中的数据进行可视化并在 Elastic Stack 进行操作
2.一张图片胜过千万行日志，Kibana 能够自由地选择如何呈现数据。Kibana 核心产品搭载了一批经典功能：柱状图、线状图、饼图、旭日图，等等。不仅如此，还可以使用 Vega 语法来设计独属于自己的可视化图形。所有这些都利用 Elasticsearch 的完整聚合功能。
3.将地理数据融入任何地图，利用 Elastic Maps Service 来实现地理空间数据的可视化，或者发挥创意，在地图上实现自定义位置数据的可视化。
4.时间序列，借助精选的时序性 UI，对 Elasticsearch 中的数据执行高级时间序列分析。也可以利用功能强大、简单易学的表达式来描述查询、转换和可视化图形
4.数据分析：凭借搜索引擎的相关性功能，结合 Graph 关联分析，揭示 Elasticsearch 数据中极其常见的关系
5.异常情况分析：借助非监督型 Machine Learning 功能来检测隐藏在 Elasticsearch 数据中的异常情况并探索那些对它们有显著影响的属性。
6.通过 Canvas，发挥无限创意： Canvas 能够基于实时数据发挥无限创意，而且此功能还支持 SQL
Github地址：https://github.com/elastic/kibana
官方网站：https://www.elastic.co/products/kibana
官方文档：https://www.elastic.co/guide/en/kibana/current/index.html

下载地址：https://www.elastic.co/downloads/kibana
根据自己的操作系统进行选择

image.png

下载完解压，到bin目录双击kibana.bat启动kibana

image.png

监听端口为5601，在浏览器中访问http://localhost:5601

image.png

进行集群健康检查

需要使用 _cat API,在Kibana控制台中运行命令 GET /_cat/health?v

image.png

可以看到名为 elasticsearch 的集群已经处于绿色状态。
每当我们查询集群健康情况时，接口可能会返回green，yellow或red状态
green意味着一切良好（集群所有的功能都正常）。
yellow意味着所有的数据都是可用的，但是一些复制分片可能没有正确分发（集群的所有功能还是正常的）
red意味着因为某些原因导致有些数据不能使用。注意，即使集群状态是red，它仍然可以运行一部分的功能

全文搜索引擎Elasticsearch

Elasticsearch简介

Elasticsearch特点：

Elasticsearch下载与安装

配置文件目录config

主配置文件`elasticsearch.yml`内容

Cluster设置集群的

Node主要是设置节点

Paths关于数据和日志的存放路径的，这两个设置十分重要，因为比如要进行版本升级，如果程序与数据分离，将非常容易实现。程序的崩溃也不影响数据。

Memory内存设置

Network有关网络的设置，比如RESTful接口，包括curl、浏览器、Kibana等HTTP连接过来的，都是通过这里设置

Discovery设置集群的节点之间的连接的

Gateway网关，支持多种类型的Gateway，有本地文件系统（默认），分布式文件系统，Hadoop的HDFS和amazon的s3云存储服务。

Various其他

Elasticsearch的可视化界面

插件模式：推荐安装Elasticsearch的head插件

在浏览器访问http://localhost:9100/

官方工具：Elasticsearch常用可视化管理工具kibana

简介：

特点：

进行集群健康检查

猜你喜欢

热点阅读

全文搜索引擎Elasticsearch

Elasticsearch简介

Elasticsearch特点：

Elasticsearch下载与安装

配置文件目录config

主配置文件elasticsearch.yml内容

Cluster设置集群的

Node主要是设置节点

Paths关于数据和日志的存放路径的，这两个设置十分重要，因为比如要进行版本升级，如果程序与数据分离，将非常容易实现。程序的崩溃也不影响数据。

Memory内存设置

Network有关网络的设置，比如RESTful接口，包括curl、浏览器、Kibana等HTTP连接过来的，都是通过这里设置

Discovery设置集群的节点之间的连接的

Gateway网关，支持多种类型的Gateway，有本地文件系统（默认），分布式文件系统，Hadoop的HDFS和amazon的s3云存储服务。

Various其他

Elasticsearch的可视化界面

插件模式：推荐安装Elasticsearch的head插件

在浏览器访问http://localhost:9100/

官方工具：Elasticsearch常用可视化管理工具kibana

简介：

特点：

进行集群健康检查

猜你喜欢

热点阅读

主配置文件`elasticsearch.yml`内容