大数据协作框架我爱编程

Oozie WorkFlow中Hive Action使用案例

2018-04-03  本文已影响241人  明明德撩码

官方地址

http://archive.cloudera.com/cdh5/cdh/5/oozie-4.0.0-cdh5.3.6/DG_HiveActionExtension.html

复制样例重新命名后对hive进行修改

 cp -r  examples/apps/hive oozie-apps/
mv oozie-apps/hive  hive-select

修改hive-select中的job.properties

nameNode=hdfs://hadoop-senior.beifeng.com:8020
jobTracker=hadoop-senior.beifeng.com:8032
queueName=default
examplesRoot=examples
oozieAppsRoot=user/beifeng/oozie-apps
oozieDataRoot=user/beifeng/oozie/datas

oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/${oozieAppsRoot}/hive-select/workflow.xml

inputDir=hive-select/input
outputDir=hive-select/output

oozie.use.system.libpath=true 表示使用hdfs系统beifeng用户下的share依赖包。
注意:端口号是否正确。hdfs:8020 jobtracker:8032

测试hive使用的api是新版本还是老版本

在hive中创建dept表

CREATE TABLE IF NOT EXISTS default.dept
(
dept_no string COMMENT 'id',
dept_name string ,
dept_url string 
)
COMMENT 'dept'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
LOCATION '/user/hive/warehouse/dept'

编写hive的sql脚本

load data local inpath '/opt/datas/dept.txt' overwrite into table dept;

编写流程xml文件

<?xml version="1.0" encoding="UTF-8"?>
<!--
  Licensed to the Apache Software Foundation (ASF) under one
  or more contributor license agreements.  See the NOTICE file
  distributed with this work for additional information
  regarding copyright ownership.  The ASF licenses this file
  to you under the Apache License, Version 2.0 (the
  "License"); you may not use this file except in compliance
  with the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
-->
<workflow-app xmlns="uri:oozie:workflow:0.5" name="hive-wf">
    <start to="hive-node"/>

    <action name="hive-node">
        <hive xmlns="uri:oozie:hive-action:0.5">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${nameNode}/${oozieAppsRoot}/${outputDir}"/>
            </prepare>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <script>dept-select.sql</script>
            <param>OUTPUT=${nameNode}/${oozieAppsRoot}/${outputDir}</param>
        </hive>
        <ok to="end"/>
        <error to="fail"/>
    </action>

    <kill name="fail">
        <message>Hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

注意:workflow和hive的版本信息。根据.cloudera的oozie官方文档说明为主。

创建hdfs上的oozie-app目录

bin/hdfs dfs -mkdir -p  /user/beifeng/oozie-apps

复制oozie中的工作流select-dept到hdfs系统

../hadoop-2.5.0-cdh5.3.6/bin/hdfs dfs -put oozie-apps/hive-select /user/beifeng/oozie-apps/

复制hive配置文件及修改工作流文件

cp ../hive-0.13.1-cdh5.3.6/conf/hive-site.xml oozie-apps/hive-select/

创建hive的依赖jar包lib及上传

mkdir -p oozie-apps/hive-select/lib
cp ../hive-0.13.1-cdh5.3.6/lib/mysql-connector-java-5.1.27-bin.jar oozie-apps/hive-select/lib

复制hive-select 到HDFS

bin/hdfs dfs -put ../oozie-4.0.0-cdh5.3.6/oozie-apps/hive-select/* /user/beifeng/oozie-apps/hive-select/

设置oozie请求地址

export OOZIE_URL=http://hadoop-senior.beifeng.com:11000/oozie

运行job

bin/oozie job -config oozie-apps/hive-select/job.properties -run

查看job运行状态

bin/oozie job -info 0000001-180315133250705-oozie-beif-W

查看mysql中dept表中是否已有数据

上一篇下一篇

猜你喜欢

热点阅读