Presto集群多机配置记录

2019-05-17  本文已影响0人  gregocean

集群已安装CDH与Hive,所以算混布。开始按照一些教程从头开始配置,一直报错,包括

  1. 发现服务找不到
io.airlift.discovery.client.CachingServiceSelector 
Cannot connect to discovery server
  1. 高版本presto报jdk8的小版本太低
Presto requires Java 8u151+

等等,直到使用这个0.177版本的简单粗暴的配置教程
https://axsauze.github.io/hadoop-overview/section-7/7-8.html

Server安装配置

分别在test集群的4、5、6节点上操作,4作为coordinator,其他俩作为worker

cd /home/work/
# 获取presto
wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.177/presto-server-0.177.tar.gz
# 解压安装
tar -xvf presto-server-0.177.tar.gz
cd presto-server-0.177
# 获取默认配置文件
wget http://media.sundog-soft.com/hadoop/presto-hdp-config.tgz
tar -xvf presto-hdp-config.tgz

查看配置文件
cat node.properties

node.environment=production # 每个节点一致
node.id=f7c4bf3c-dbb4-4807-baae-9b7e41807bc4 # 这个id需要每个节点不同
node.data-dir=/var/presto/data # 不存在的目录需要新建

cat log.properties

com.facebook.presto=WARN

cat jvm.config

# 根据自己机器情况配置
-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p

cat config.properties

# coordinator的配置
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8090
query.max-memory=10GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://10.1.249.55:8090 # coordinator 的内网地址

# worker的配置
coordinator=false
# node-scheduler.include-coordinator=true
http-server.http.port=8090
query.max-memory=10GB
query.max-memory-per-node=1GB
#discovery-server.enabled=true
discovery.uri=http://10.1.249.55:8090 # coordinator 的内网地址

修改hive连接配置中的metastore.uri
cat catalog/hive.properties

connector.name=hive-hadoop2
hive.metastore.uri=thrift://bigdata-hadoop-test-datanode02.com:9083
hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml

启动Server

先启动coordinator(+发现服务)

bin/launcher start

然后在worker节点也用这个指令启动

安装客户端

在节点4安装

cd bin
wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.177/presto-cli-0.177-executable.jar
# And rename it
mv presto-cli-0.177-executable.jar presto
# and executable
chmod +x presto

使用

bin/presto --server 127.0.0.1:8090 --catalog hive
# 只起了1个worker
presto> select * from test.t_employee;
 id | emp_name |  dep_name  | salary | age
----+----------+------------+--------+-----
 12 | Matthew  | Management | 4500.0 |  20
 13 | Matthew  | Management | 4600.0 |  60
  1 | Matthew  | Management | 4500.0 |  55
  2 | Olivia   | Management | 4400.0 |  61
  3 | Grace    | Management | 4000.0 |  42
  4 | Jim      | Production | 3700.0 |  35
  5 | Alice    | Production | 3500.0 |  24
  6 | Michael  | Production | 3600.0 |  28
  7 | Tom      | Production | 3800.0 |  35
  8 | Kevin    | Production | 4000.0 |  52
  9 | Elvis    | Service    | 4100.0 |  40
 10 | Sophia   | Sales      | 4300.0 |  36
 11 | Samantha | Sales      | 4100.0 |  38
(13 rows)
Query 20190517_081300_00005_xwftv, FINISHED, 2 nodes
Splits: 19 total, 19 done (100.00%)
0:01 [13 rows, 377B] [10 rows/s, 305B/s]
# 起了第二个worker,并重新连接客户端
presto> select count(*) from peiyou4.t_employee;
 _col0
-------
    13
(1 row)

Query 20190517_083441_00006_xwftv, FINISHED, 3 nodes
Splits: 20 total, 20 done (100.00%)
0:01 [13 rows, 377B] [9 rows/s, 289B/s]

附一个Python连接的例子

import prestodb
conn=prestodb.dbapi.connect(
    host='localhost',
    port=8090,
    user='me',
    catalog='hive',
    schema='test',
)
cur = conn.cursor()
cur.execute('SELECT * FROM test.t_employee')
rows = cur.fetchall()
for i in rows:
    print i

Done。

上一篇 下一篇

猜你喜欢

热点阅读