数据仓库-Hive基础(九) UDF作业实录

2020-06-23  本文已影响0人  做个合格的大厂程序员

分享一个今天做的题:

题目:

有原始json数据如下:

{"movie":"1193","rate":"5","timeStamp":"978300760","uid":"1"}
{"movie":"661","rate":"3","timeStamp":"978302109","uid":"1"}
{"movie":"914","rate":"3","timeStamp":"978301968","uid":"1"}
{"movie":"3408","rate":"4","timeStamp":"978300275","uid":"1"}
{"movie":"2355","rate":"5","timeStamp":"978824291","uid":"1"}
{"movie":"1197","rate":"3","timeStamp":"978302268","uid":"1"}
{"movie":"1287","rate":"5","timeStamp":"978302039","uid":"1"}

需要将数据导入到hive数据仓库中,我不管你中间用几个表,最终我要得到一个结果表:

1193    5       978300760       1
661     3       978302109       1
914     3       978301968       1
3408    4       978300275       1
2355    5       978824291       1
1197    3       978302268       1
1287    5       978302039       1

step 1

第一步:先创建一个初始的表,表中只含有一个字段,这个字段包含了原始表中的一行数据。

create table if not exists basicJson(jsonname string)row format delimited fields terminated by '\t';

导入原始数据

load data local inpath "/export/softwares/movieJson.csv";

查询之后获得

{"movie":"1193","rate":"5","timeStamp":"978300760","uid":"1"}
{"movie":"661","rate":"3","timeStamp":"978302109","uid":"1"}
{"movie":"914","rate":"3","timeStamp":"978301968","uid":"1"}
{"movie":"3408","rate":"4","timeStamp":"978300275","uid":"1"}
{"movie":"2355","rate":"5","timeStamp":"978824291","uid":"1"}
{"movie":"1197","rate":"3","timeStamp":"978302268","uid":"1"}
{"movie":"1287","rate":"5","timeStamp":"978302039","uid":"1"}

step 2

打开java工程,新建项目,

public Text evaluate(final Text jsonObject) throws Exception {
        String jsonString = jsonObject.toString();
        if (null != jsonString && !jsonString.toString().equals("")){
            return new Text(change2String(jsonString));
        }else{
            return new Text("");
        }
    }

    public static String change2String(String jsonString) throws Exception{
        JSONObject json = new JSONObject(jsonString);
        String movie = json.getString("movie");
        String rate = json.getString("rate");
        String timeStamp = json.getString("timeStamp");
        String uid = json.getString("uid");
        String finalString = movie+"\t"+rate+"\t"+timeStamp+"\t"+uid+"\t";
        return finalString;
    }

这里我们用到了JSONObject类,需要在Maven中导入依赖

<dependency>
    <groupId>org.json</groupId>
    <artifactId>json</artifactId>
    <version>20180813</version>
</dependency>

然后打包获得jar包导入到hive中

add jar /export/servers/hive-1.1.0-cdh5.14.0/lib/json2String.jar;

Hive中关联相关函数

create temporary function json2String as 'cn.leon.json.json2String';

step 3

在hive中使用相关函数获得生成的查询内容并且导入到本地文件中

insert overwrite local directory '/export/softwares/json' select json2String(jsonname) from basicjson; 

然后hive创建一个新表包含四个字段

create table if not exists finalMovies(movie string,rate string,timesstamp string,uid string)row format delimited fields terminated by '\t';

最后导入本地数据

load data local inpath '/export/softwares/json' into table finalMovies;

查询出来的结果为正确

1193    5       978300760       1
661     3       978302109       1
914     3       978301968       1
3408    4       978300275       1
2355    5       978824291       1
1197    3       978302268       1
1287    5       978302039       1
上一篇 下一篇

猜你喜欢

热点阅读