Python+Spark+Hadoop

spark

2018-03-06  本文已影响32人  懂事的观众

spark

spark的安装

本地运行pyspark

pyspark --master local[*]

测试命令:

sc.master

textFile = sc.textFile("file:/usr/local/spark/README.md")
textFile.count()

Spark Standalone Cluster 运行环境

复制模板文件

cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh

编辑spark-env.sh文件

export SPARK_MASTER_IP=master#masterIP设置
export SPARK_WORKER_CORES=1#worker使用CPU核心数
export SPARK_WORKER_MEMORY=512m#每个worker使用内存
export SPARK_WORKER_INSTANCES=1#实例数
ssh data1

sudo mkdir /usr/local/spark

sudo chown hduser:hduser /usr/local/spark

exit

sudo scp -r /usr/local/spark hduser@data1:/usr/local

data2、data3同样配置

编辑slaves文件

sudo nano /usr/local/spark/conf/slaves

data1
data2
data3
/usr/local/spark/sbin/start-all/sh
/usr/local/spark/sbin/start-master/sh
/usr/local/spark/sbin/start-slaves/sh
pyspark --master spark://master:7077 --num-executors 1 --total-executor-cores 3 --executor-memory 512m
sc.master

textFile = sc.textFile("file:/usr/local/spark/README.md")
textFile.count()

http://master:8080/

IPython Notebook运行python spark

安装JUPYTER

sudo pip3 install jupyter

配置jupyter远程连接

不同模式下pyspark 的jupyter notebook运行

上一篇 下一篇

猜你喜欢

热点阅读