Apache Phoenix(三)新特性之用户定义函数UDFS(

2020-02-07  本文已影响0人  我知他风雨兼程途径日暮不赏

  在Phoenix 4.4.中我们允许用户创建和部署它们自己自定义或特定于域的用户自定义函数在集群功能中。

概述

  用户可以创建临时或者永久的用户自定义或者特定于域的标量函数。UDFs可以像内置函数一样使用在查询中,比如select、upsert和delete,create函数一样。临时函数可以作用于指定的会话和连接,但是不能在其他的会话和连接中使用。永久函数元数据信息将会被储存在系统的SYSTEM_FUNCTION表中。我们可以支持特定的租户功能。函数创建在一个特定的租户连接中,其他租户连接是不可见的。仅当全局租户(没有租户)指定函数时,特定的函数对全部连接可见。
  我们利用Hbase动态类加载器从HDFS中动态加载udf jar包在phoenix客户端和区域服务器中,不需要再重启服务。

配置

  你需要添加下面的参数在hbase-site.xml在Phoenix的客户端。

<property>
  <name>phoenix.functions.allowUserDefinedFunctions</name>
  <value>true</value>
</property>
<property>
  <name>fs.hdfs.impl</name>
  <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>
<property>
  <name>hbase.rootdir</name>
  <value>${hbase.tmp.dir}/hbase</value>
  <description>The directory shared by region servers and into
    which HBase persists.  The URL should be 'fully-qualified'
    to include the filesystem scheme.  For example, to specify the
    HDFS directory '/hbase' where the HDFS instance's namenode is
    running at namenode.example.org on port 9000, set this value to:
    hdfs://namenode.example.org:9000/hbase.  By default, we write
    to whatever ${hbase.tmp.dir} is set too -- usually /tmp --
    so change this configuration or else all data will be lost on
    machine restart.</description>
</property>
<property>
  <name>hbase.dynamic.jars.dir</name>
  <value>${hbase.rootdir}/lib</value>
  <description>
    The directory from which the custom udf jars can be loaded
    dynamically by the phoenix client/region server without the need to restart. However,
    an already loaded udf class would not be un-loaded. See
    HBASE-1936 for more details.
  </description>
</property>

注意:最后两个参数配置,必须和hbase服务器配置一致。
  与其他配置属性一样,phoenix.functions.allowUserDefinedFunctions这个属性可以在指定的JDBC连接时作为连接属性指定。

Properties props = new Properties();
props.setProperty("phoenix.functions.allowUserDefinedFunctions", "true");
Connection conn = DriverManager.getConnection("jdbc:phoenix:localhost", props);

  下面的动态类加载器拷贝jar包从hdfs到本地文件系统的参数是可选择的。

<property>
  <name>hbase.local.dir</name>
  <value>${hbase.tmp.dir}/local/</value>
  <description>Directory on the local filesystem to be used
    as a local storage.</description>
</property>

创建定制UDFs

丢弃UDFs

你可以丢弃函数通过DROP FUNCTION查询语句。丢其函数会删除函数元数据,在phoneix中。

如何编写定制UDF

你可以跟随几个简单的步骤来写你的UDF(获取更多详细信息,请查看博客):

 /**
     * Determines whether or not a function may be used to form
     * the start/stop key of a scan
     * @return the zero-based position of the argument to traverse
     *  into to look for a primary key column reference, or
     *  {@value #NO_TRAVERSAL} if the function cannot be used to
     *  form the scan key.
     */
    public int getKeyFormationTraversalIndex() {
        return NO_TRAVERSAL;
    }

    /**
     * Manufactures a KeyPart used to construct the KeyRange given
     * a constant and a comparison operator.
     * @param childPart the KeyPart formulated for the child expression
     *  at the {@link #getKeyFormationTraversalIndex()} position.
     * @return the KeyPart for constructing the KeyRange for this
     *  function.
     */
    public KeyPart newKeyPart(KeyPart childPart) {
        return null;
    }
 /**
     * Determines whether or not the result of the function invocation
     * will be ordered in the same way as the input to the function.
     * Returning YES enables an optimization to occur when a
     * GROUP BY contains function invocations using the leading PK
     * column(s).
     * @return YES if the function invocation will always preserve order for
     * the inputs versus the outputs and false otherwise, YES_IF_LAST if the
     * function preserves order, but any further column reference would not
     * continue to preserve order, and NO if the function does not preserve
     * order.
     */
    public OrderPreserving preservesOrder() {
        return OrderPreserving.NO;
    }

局限性

上一篇 下一篇

猜你喜欢

热点阅读