HIVE如何创建UDF和UDAF

2019-04-25  本文已影响0人  cyangssrs

UDF和UDAF简介

UDF

udf 是hive function 是传入某一行的一个或者多个字段,然后返回一个value;
例如:

SELECT lower(CLOUMN_NAME) FROM TABLE_NAME

UDAF

UDFA是需要做一些聚合操作的函数
例如:

select sum(column_name) from table_name

如何创建自定义UDF和UDAF

Simple

simple方式就直接继承UDF类

    /** A simple UDF to convert Celcius to Fahrenheit */
    public class ConvertToCelcius extends UDF {
    public double evaluate(double value) {
    return (value - 32) / 1.8;
  }
}

完成后你可以这样调用:

hive> addjar my-udf.jar
hive> create temporary function fahrenheit_to_celcius using "com.mycompany.hive.udf.ConvertToCelcius";
hive> SELECT fahrenheit_to_celcius(temp_fahrenheit) from temperature_data;

简而言之,创建一个simple udf 你只需要做两件事

  1. 继承rg.apache.hadoop.hive.ql.exec.UDF class
  2. 实现一个evaluate方法

simple udf 可以使用大量的数据类型,不仅仅是java primitive types 也可以使用hadoop IO types

string java.lang.String, org.apache.hadoop.io.Text
int int, java.lang.Integer, org.apache.hadoop.io.IntWritable
boolean bool, java.lang.Boolean, org.apache.hadoop.io.BooleanWritable
array<type> java.util.List<Java type>
map<ktype, vtype> java.util.Map<Java type for K, Java type for V>
struct Don't use Simple UDF, use GenericUDF

Simple vs Generic

Simple Generic
Reduced performance due to use of reflection: each call of the evaluate method is reflective. Furthermore, all arguments are evaluated and parsed. Optimal performance: no reflective call, and arguments are parsed lazily
Limited handling of complex types. Arrays are handled but suffer from type erasure limitations All complex parameters are supported (even nested ones like array<array>
Variable number of arguments are not supported Variable number of arguments are supported
Very easy to write Not very difficult, but not well documented

Generic

首先,需要继承GenericUDF类
然后需要实现三个方法:

    public interface GenericUDF {
    public Object evaluate(DeferredObject[] args) throws HiveException;
    public String getDisplayString(String[] args);
    public ObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException;
  }
  1. initalize
public ObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException;

这个方法为每个传入的参量 接受一个 ObjectInspector 并且为返回值返回一个ObjectInspector

  1. evaluate
    这个方法实现函数的逻辑
  2. getDisplayString
    随便返回个说明

例子:

 import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
  import org.apache.hadoop.hive.ql.metadata.HiveException;
  import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
  import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
  import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
  import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
  import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
  import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
  import org.apache.hadoop.io.IntWritable;

  public class UDFMultiplyByTwo extends GenericUDF {
  PrimitiveObjectInspector inputOI;
  PrimitiveObjectInspector outputOI;

  public ObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException {
  // This UDF accepts one argument
  assert (args.length == 1);
  // The first argument is a primitive type
  assert(args[0].getCategory() == Category.PRIMITIVE);

  inputOI  = (PrimitiveObjectInspector)args[0];
  /* We only support INTEGER type */
  assert(inputOI.getPrimitiveCategory() == PrimitiveCategory.INT);

  /* And we'll return a type int, so let's return the corresponding object inspector */
  outputOI = PrimitiveObjectInspectorFactory.writableIntObjectInspector;

  return outputOI;
}

public Object evaluate(DeferredObject[] args) throws HiveException {
if (args.length != 1) return null;

// Access the deferred value. Hive passes the arguments as "deferred" objects 
// to avoid some computations if we don't actually need some of the values
Object oin = args[0].get();

if (oin == null) return null;

int value = (Integer) inputOI.getPrimitiveJavaObject(oin); 

int output = value * 2;
return new IntWritable(output);
}

@Override
public String getDisplayString(String[] args) {
return "Here, write a nice description";
}
}
上一篇下一篇

猜你喜欢

热点阅读