hive udf编写

2023-07-02  本文已影响0人  后知不觉1

1、介绍

Hive 自定义函数类别分为以下三种:
(1)UDF(User-Defined-Function) 一进一出
(2)UDAF(User-Defined Aggregation Function) 聚集函数,多进一出 类似于:count/max/min
(3)UDTF(User-Defined Table-Generating Functions) 一进多出

udf的编写主要是继承UDF类,新增evaluate方法。evaluate方法的参数是sql中调用的参数,返回值是查询需要的返回值

2、代码展示

pom.xml

<dependency>
  <groupId>org.apache.hive</groupId>
  <artifactId>hive-exec</artifactId>
  <version>2.1.1</version>
  <scope>provided</scope>   #因为在正式环境不能有这个包,本地调试需要,所以在打包时要添加这个标签,在调试时去掉这个标签
  <exclusions>
    <exclusion>
      <groupId>jackson.databind</groupId>
      <artifactId>com.fasterxml.jackson.databind</artifactId>
    </exclusion>
    <exclusion>
      <artifactId>log4j-core</artifactId>
      <groupId>org.apache.logging.log4j</groupId>
    </exclusion>
    <exclusion>
      <artifactId>calcite-avatica</artifactId>
      <groupId>org.apache.calcite</groupId>
    </exclusion>
    <exclusion>
      <artifactId>calcite-core</artifactId>
      <groupId>org.apache.calcite</groupId>
    </exclusion>
    <exclusion>
      <groupId>jackson.annotations</groupId>
      <artifactId>com.fasterxml.jackson.annotations</artifactId>
    </exclusion>
    <exclusion>
      <groupId>jackson.core</groupId>
      <artifactId>com.fasterxml.jackson.core</artifactId>
    </exclusion>
    <exclusion>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-databind</artifactId>
    </exclusion>
    <exclusion>
      <groupId>org.apache.logging.log4j</groupId>
      <artifactId>log4j-core</artifactId>
    </exclusion>
    <exclusion>
      <groupId>org.apache.logging.log4j</groupId>
      <artifactId>log4j-1.2-api</artifactId>
    </exclusion>
    <exclusion>
      <groupId>org.apache.logging.log4j</groupId>
      <artifactId>log4j-slf4j-impl</artifactId>
    </exclusion>
  </exclusions>
</dependency>
<dependency>
  <groupId>log4j</groupId>
  <artifactId>log4j</artifactId>
  <version>1.2.17</version>
</dependency>
<dependency>
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-api</artifactId>
  <version>1.7.25</version>
</dependency>
<dependency>
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-log4j12</artifactId>
  <version>1.7.25</version>
</dependency>

log4j.properties

log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n

TestUdf.java

package com.tianzehao;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class TestUdf extends UDF {
    private static Logger Log = LoggerFactory.getLogger(TestUdf.class);
    public boolean evaluate(String a){
        Log.info("tianzehao dqc test");
        return true;
    }
}

3、测试

image.png

throubleShooting

现象

根据日志输出发现存在udf执行多次的问题,是因为udf中没有使用到任何列的信息,全部是常量导致。


image.png
解决

在编辑udf时接受一个列参数可以不做任务处理继续常量输出即可。比如上述的udf

    select test(id) from test  ;
上一篇 下一篇

猜你喜欢

热点阅读