mysql数据同步到elasticsearch
环境
- centos7.2
- php5.5.7
- mysql5.7
安装elasticsearch
下载地址:https://www.elastic.co/downloads/elasticsearch
image安装2.4.6
wget https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/rpm/elasticsearch/2.4.6/elasticsearch-2.4.6.rpm
yum -y install elasticsearch-2.4.6.rpm
环境:
- ubuntu 18.0.4
- php7.2.13
- mysql5.7
- java1.8+
- go1.9+
安装elasticsearch
下载地址:https://www.elastic.co/downloads/elasticsearch
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.0.deb
gdebi elasticsearch-6.5.0.deb
安装完成后,配置elasticsearch
vim /etc/elasticsearch/elasticsearch.yml
//配置内容
.
.
.
cluster.name: zpdx-search //集群名称
.
.
.
node.name: zpdx-1 //节点名称
.
.
.
path.data: /var/lib/elasticsearch //数据路径
.
.
.
path.log: /var/log/elasticsearch //日志路径
.
.
.
network.host: 127.0.0.1 //主机地址,默认
.
.
.
http:port: 9200 //端口,默认
开启elasticsearch
systemctl start elasticsearch.service
将elasticsearch
设置开机自启动
sudo /bin/systemctl daemon-reload
sudo /bin/systemctl enable elasticsearch.service
安装mvn
sudo apt update
sudo apt install maven
安装 ik
github
下载地址: elasticsearch-analysis-ik
git clone https://github.com/medcl/elasticsearch-analysis-ik.git
cd lasticsearch-analysis-ik
git checkout tags/1.10.6
mvn clean
mvn compile
mvn package
在./elasticsearch-analysis-ik/target/releases
目录会下载了一个对应版本的插件包elasticsearch-analysis-ik-1.10.6.zip
将插件包复制到elaticsearch
的插件目录下
cp elasticsearch-analysis-ik-1.10.6.zip /usr/share/elasticsearch/plugins/
cd /usr/share/elasticsearch/plugins/
unzip elasticsearch-analysis-ik-1.10.6.zip
mkdir ik
mv ./* ik
重启elasticsearch
systemctl restart elasticsearch.service
测试,使用curl 'http://127.0.0.1:9200',或者用浏览器访问
http://127.0.0.1:9200```,显示
{
"name" : "zpdx-1",
"cluster_name" : "zpdxshop-search",
"cluster_uuid" : "g1wO4CR8TGOwfrDHY-vcGw",
"version" : {
"number" : "6.5.0",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "816e6f6",
"build_date" : "2018-11-09T18:58:36.352602Z",
"build_snapshot" : false,
"lucene_version" : "7.5.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
elasticsearch
安装完成并成功启动
安装go
sudo apt update
sudo apt install golang
查看go 版本
go version
查看go 环境
go env
设置go
环境,vim /etc/profile
//将go环境路径设置到/usr/local/go方便管理
//在最后添加
.
.
.
#go环境
export GOPATH=/usr/local/go
保存并退出,使/etc/profile
文件生效
source /etc/profile
安装go-mysql-elasticsearch
go-mysql-elasticsearch的基本原理是:如果是第一次启动该程序,首先使用mysqldump工具对源mysql数据库进行一次全量同步,通过elasticsearch client执行操作写入数据到ES;然后实现了一个mysql client,作为slave连接到源mysql,源mysql作为master会将所有数据的更新操作通过binlog event同步给slave, 通过解析binlog event就可以获取到数据的更新内容,之后写入到ES.
使用限制
1. mysql binlog必须是ROW模式
2. 要同步的mysql数据表必须包含主键,否则直接忽略,这是因为如果数据表没有主键,UPDATE和DELETE操作就会因为在ES中找不到对应的document而无法进行同步
3. 不支持程序运行过程中修改表结构
4. 要赋予用于连接mysql的账户RELOAD权限以及REPLICATION权限, SUPER权限:
GRANT REPLICATION SLAVE ON *.* TO 'admin'@'127.0.0.1';
GRANT RELOAD ON *.* TO 'admin'@'27.0.0.1';
UPDATE mysql.user SET Super_Priv='Y' WHERE user='admin' AND host='127.0.0.1';
查看mysql
的binlog
、server_id
,需要设置
//进入mysql
mysql> show global variables like '%binlog_format%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | ROW |
+---------------+-------+
1 row in set (0.02 sec)
mysql> show variables like '%server_id%';
+----------------+-------+
| Variable_name | Value |
+----------------+-------+
| server_id | 100 |
| server_id_bits | 32 |
+----------------+-------+
2 rows in set (0.01 sec)
mysql>
设置mysql,编辑/etc/mysql/mysql.conf.d/mysqld.cnf
vim /etc/mysql/mysql.conf.d/mysqld.cnf
.
.
.
[mysql]
log-bin=mysql-bin
binlog_format="ROW"
server-id=100
重启mysql
systemctl restart mysql
安装go-mysql-elasticsearch
github下载地址:go-mysql-elasticsearch
go get github.com/siddontang/go-mysql-elasticsearch
cd $GOPATH/src/github.com/siddontang/go-mysql-elasticsearch
make
go-elasticsearch
不能自动创建索引index
,需手动创建
vim create_index.json
//添加内容
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"content": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_smart",
"search_analyzer": "ik_smart"
},
"content": {
"type": "text",
"analyzer": "ik_smart",
"search_analyzer": "ik_smart"
}
}
}
}
}
curl
执行
chase@chase-MACH-WX9:~$ curl -XPUT 'localhost:9200/test?pretty' -H 'Content-Type:application/json' -d'@create_index.json'
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "test"
}
chase@chase-MACH-WX9:~$
创建成功,查看索引
chase@chase-MACH-WX9:~$ curl localhost:9200/test?pretty
{
"test" : {
"aliases" : { },
"mappings" : {
"content" : {
"properties" : {
"content" : {
"type" : "text",
"analyzer" : "ik_smart"
},
"title" : {
"type" : "text",
"analyzer" : "ik_smart"
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1548855659726",
"number_of_shards" : "1",
"number_of_replicas" : "0",
"uuid" : "F3ze_A2gQo6mE88aEQzKNA",
"version" : {
"created" : "6050099"
},
"provided_name" : "test"
}
}
}
}
chase@chase-MACH-WX9:~$
配置go-mysql-elasticsearch
,编辑$GOPATH/src/github.com/siddontang/go-mysql-elasticsearch/etc/river.toml
# MySQL address, user and password
# user must have replication privilege in MySQL.
my_addr = "127.0.0.1:3306"
my_user = "admin"
my_pass = "123456"
my_charset = "utf8"
# Set true when elasticsearch use https
#es_https = false
# Elasticsearch address
es_addr = "127.0.0.1:9200"
# Elasticsearch user and password, maybe set by shield, nginx, or x-pack
es_user = ""
es_pass = ""
# Path to store data, like master.info, if not set or empty,
# we must use this to support breakpoint resume syncing.
# TODO: support other storage, like etcd.
data_dir = "./var"
# Inner Http status address
stat_addr = "127.0.0.1:12800"
# pseudo server id like a slave
server_id = 100
# mysql or mariadb
flavor = "mysql"
# mysqldump execution path
# if not set or empty, ignore mysqldump.
mysqldump = "mysqldump"
# if we have no privilege to use mysqldump with --master-data,
# we must skip it.
#skip_master_data = false
# minimal items to be inserted in one bulk
bulk_size = 128
# force flush the pending requests if we don't have enough items >= bulk_size
flush_bulk_time = "200ms"
# Ignore table without primary key
skip_no_pk_table = false
# MySQL data source
[[source]]
schema = "test"
# Only below tables will be synced into Elasticsearch.
# "t_[0-9]{4}" is a wildcard table format, you can use it if you have many sub tables, like table_0000 - table_1023
# I don't think it is necessary to sync all tables in a database.
tables = ["node"]
# Below is for special rule mapping
# Very simple example
#
# desc t;
# +-------+--------------+------+-----+---------+-------+
# | Field | Type | Null | Key | Default | Extra |
# +-------+--------------+------+-----+---------+-------+
# | id | int(11) | NO | PRI | NULL | |
# | name | varchar(256) | YES | | NULL | |
# +-------+--------------+------+-----+---------+-------+
#
# The table `t` will be synced to ES index `test` and type `t`.
[[rule]]
schema = "test"
table = "node"
index = "test"
type = "content"
启动go-mysql-elasticsearch
cd $GOPATH/src/github.com/siddontang/go-mysql-elasticsearch
./bin/go-mysql-elasticsearch -config=./etc/river.toml
同步数据信息
[2019/01/30 21:48:31] [info] binlogsyncer.go:111 create BinlogSyncer with config {100 mysql 127.0.0.1 3306 admin utf8 false false <nil> false false 0 0s 0s 0}
[2019/01/30 21:48:31] [info] dump.go:164 skip dump, use last binlog replication pos (mysql-bin.000001, 790) or GTID set %!s(<nil>)
[2019/01/30 21:48:31] [info] status.go:53 run status http server 127.0.0.1:12800
[2019/01/30 21:48:31] [info] binlogsyncer.go:323 begin to sync binlog from position (mysql-bin.000001, 790)
[2019/01/30 21:48:31] [info] binlogsyncer.go:172 register slave for master server 127.0.0.1:3306
[2019/01/30 21:48:31] [info] sync.go:31 start sync binlog at binlog file (mysql-bin.000001, 790)
[2019/01/30 21:48:31] [info] binlogsyncer.go:692 rotate to (mysql-bin.000001, 790)
[2019/01/30 21:48:31] [info] binlogsyncer.go:692 rotate to (mysql-bin.000002, 4)
[2019/01/30 21:48:31] [info] sync.go:73 rotate binlog to (mysql-bin.000001, 790)
[2019/01/30 21:48:31] [info] master.go:54 save position (mysql-bin.000001, 790)
[2019/01/30 21:48:31] [info] sync.go:73 rotate binlog to (mysql-bin.000002, 4)
[2019/01/30 21:48:31] [info] master.go:54 save position (mysql-bin.000002, 4)
自此,数据全量导入到elasticsearch
设置go-mysql-elasticsearch
开机自启动
ubuntu安装sysv-rc-conf
sudo apt update
sudo apt install sysv-rc-conf
如果报错:
sudo apt-get install sysv-rc-conf
正在读取软件包列表... 完成
正在分析软件包的依赖关系树
正在读取状态信息... 完成
E: 无法定位软件包 sysv-rc-conf
解决方法:添加镜像源
vim /etc/apt/sources.list
//添加如下内容
deb http://archive.ubuntu.com/ubuntu/ trusty main universe restricted multiverse
sudo apt update
sudo apt install sysv-rc-con
创建go-mysql-elasticsearch
脚本
vim /etc/init.d/go-mysql-elasticsearch
//添加内容如下:
#!/bin/bash
#go-mysql-elasticsearch start
$GOPATH/src/github.com/siddontang/go-mysql-elasticsearch/bin/go-mysql-elasticsearch -config=$GOPATH/src/github.com/siddontang/go-mysql-elasticsearch/etc/river.toml
执行
sysv-rc-conf go-mysql-elasticsearch on
service go-mysql-elasticsearch start
centos安装chkconfig
yum -y install chkconfig
安装elasticsearch-php
composer require elasticsearch/elasticsearch