idea用maven开发hive的udf详细过程
2017-12-07 本文已影响295人
解脱了
-
创建maven项目
file>new>project
2.添加依赖的jar包,第一次添加可能有点慢
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>scc</groupId>
<artifactId>UDF</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>1.4</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/spring.handlers</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.neu.hive.UDF.ToUpperCaseUDF</mainClass>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/spring.schemas</resource>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
导入完毕后左边多出很多jar包,右边则没有红色横杠
3.开始开发
在java下新建new>package
上传服务器,加载jar包,创建临时函数
add jar /usr/local/usrJars/dulm/hiveUDF-0.0.1-SNAPSHOT-all.jar;
create temporary function my_uppercase as 'com.neu.hive.UDF.ToUpperCaseUDF';
创建临时的方法叫做my_uppercase as 你的包名+类名。
select my_uppercase(datasource) from tenmindata limit 10;
测试使用: 选出字段datasource下的数据并全部转为大写,显示前10条。
码表是txt格式,关联表不管什么格式,输入输出都是string类型
add jar /root/yl/udf_province4.jar;
create temporary function split_province_txt as 'hive_udf_province.UDF_province_name_txt';
select split_province_txt(province_id) from yl.dim_province;
select split_province_txt(province_id) from yl.dim_province_orc;
线上服务要在每个库上面创建一下永久函数。
创建永久函数
CREATE FUNCTION dws.bss_city_code AS 'com.ysten.bigdata.hive.udf.GetCityBYBssCity';