Spark 中自定义udf 配合 withColumn 实现新增

2019-04-23  本文已影响0人  chenxk

1.定义UDF

val get_online  = (pc_login_time:String,app_login_time:String) => {

    def get_minutes(login_time:String):Int = {
      val formatTime = login_time.substring(0,19)
      val dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss")
      val date = dateFormat.parse(formatTime)
      val diff = System.currentTimeMillis() - date.getTime
      (diff / 1000 / 60).intValue()
    }


    var res = 0
    try{
      if (pc_login_time != null && pc_login_time.length>15){
        if (get_minutes(pc_login_time) < 15){
          res = 2
        }
      }
      if (app_login_time != null && app_login_time.length>15){
        if (get_minutes(app_login_time) < 3*24*60){
          res = 2
        }
      }
    }
    res
  }

2.调用Udf

    val udf_online = udf(get_online)
    val god_online_1 = user_df.filter(col("if_god")===1)
      .withColumn("online_score", udf_online(col("pc_login_time"),col("app_login_time")))
上一篇 下一篇

猜你喜欢

热点阅读