收藏

flink13.5整合hudi10

2022-02-07  本文已影响0人  wudl

1. 版本

组件 版本
hudi 10.0
flink 13.5

2.hudi 源码下载

https://github.com/apache/hudi/releases

2.1 需要改flink 版本为13.5

根目录下面的pom 文件

<flink.version>1.13.5</flink.version>
<hive.version>3.1.0</hive.version>
<hadoop.version>3.1.1</hadoop.version>
 

2.2 编译命令

mvn clean package -DskipTests
# 或者指定scala 版本
 
#编译后的包
包的路径在packaging/hudi-flink-bundle/target/hudi-flink-bundle_2.12-*.*.*-SNAPSHOT.jar

2.3编译遇到一个错误

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Hudi 0.10.0:
[INFO] 
[INFO] Hudi ............................................... SUCCESS [  1.642 s]
[INFO] hudi-common ........................................ SUCCESS [  9.808 s]
[INFO] hudi-aws ........................................... SUCCESS [  1.306 s]
[INFO] hudi-timeline-service .............................. SUCCESS [  1.623 s]
[INFO] hudi-client ........................................ SUCCESS [  0.082 s]
[INFO] hudi-client-common ................................. SUCCESS [  8.027 s]
[INFO] hudi-hadoop-mr ..................................... SUCCESS [  2.825 s]
[INFO] hudi-spark-client .................................. SUCCESS [ 13.891 s]
[INFO] hudi-sync-common ................................... SUCCESS [  0.718 s]
[INFO] hudi-hive-sync ..................................... SUCCESS [  3.027 s]
[INFO] hudi-spark-datasource .............................. SUCCESS [  0.066 s]
[INFO] hudi-spark-common_2.12 ............................. SUCCESS [  7.706 s]
[INFO] hudi-spark2_2.12 ................................... SUCCESS [  9.535 s]
[INFO] hudi-spark_2.12 .................................... SUCCESS [ 25.923 s]
[INFO] hudi-utilities_2.12 ................................ FAILURE [  2.638 s]
[INFO] hudi-utilities-bundle_2.12 ......................... SKIPPED
[INFO] hudi-cli ........................................... SKIPPED
[INFO] hudi-java-client ................................... SKIPPED
[INFO] hudi-flink-client .................................. SKIPPED
[INFO] hudi-spark3_2.12 ................................... SKIPPED
[INFO] hudi-dla-sync ...................................... SKIPPED
[INFO] hudi-sync .......................................... SKIPPED
[INFO] hudi-hadoop-mr-bundle .............................. SKIPPED
[INFO] hudi-hive-sync-bundle .............................. SKIPPED
[INFO] hudi-spark-bundle_2.12 ............................. SKIPPED
[INFO] hudi-presto-bundle ................................. SKIPPED
[INFO] hudi-timeline-server-bundle ........................ SKIPPED
[INFO] hudi-hadoop-docker ................................. SKIPPED
[INFO] hudi-hadoop-base-docker ............................ SKIPPED
[INFO] hudi-hadoop-namenode-docker ........................ SKIPPED
[INFO] hudi-hadoop-datanode-docker ........................ SKIPPED
[INFO] hudi-hadoop-history-docker ......................... SKIPPED
[INFO] hudi-hadoop-hive-docker ............................ SKIPPED
[INFO] hudi-hadoop-sparkbase-docker ....................... SKIPPED
[INFO] hudi-hadoop-sparkmaster-docker ..................... SKIPPED
[INFO] hudi-hadoop-sparkworker-docker ..................... SKIPPED
[INFO] hudi-hadoop-sparkadhoc-docker ...................... SKIPPED
[INFO] hudi-hadoop-presto-docker .......................... SKIPPED
[INFO] hudi-integ-test .................................... SKIPPED
[INFO] hudi-integ-test-bundle ............................. SKIPPED
[INFO] hudi-examples ...................................... SKIPPED
[INFO] hudi-flink_2.12 .................................... SKIPPED
[INFO] hudi-kafka-connect ................................. SKIPPED
[INFO] hudi-flink-bundle_2.12 ............................. SKIPPED
[INFO] hudi-kafka-connect-bundle .......................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:29 min
[INFO] Finished at: 2022-02-06T17:59:02+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project hudi-utilities_2.12: Could not resolve dependencies for project org.apache.hudi:hudi-utilities_2.12:jar:0.10.0: The following artifacts could not be resolved: io.confluent:kafka-avro-serializer:jar:5.3.4, io.confluent:common-config:jar:5.3.4, io.confluent:common-utils:jar:5.3.4, io.confluent:kafka-schema-registry-client:jar:5.3.4: Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.4 in aliyunmaven (https://maven.aliyun.com/repository/public) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :hudi-utilities_2.12

以上错误需要手动下载包后添加本地仓库

mvn install:install-file -Dfile=/opt/myjar/common-config-5.3.0.jar -DgroupId=io.confluent -DartifactId=common-config -Dversion=5.3.0 -Dpackaging=jar

mvn install:install-file -Dfile=/opt/myjar/common-utils-5.3.0.jar -DgroupId=io.confluent -DartifactId=common-utils -Dversion=5.3.0 -Dpackaging=jar


mvn install:install-file -Dfile=/opt/myjar/kafka-avro-serializer-5.3.0.jar -DgroupId=io.confluent -DartifactId=kafka-avro-serializer -Dversion=5.3.0 -Dpackaging=jar

mvn install:install-file -Dfile=/opt/myjar/kafka-schema-registry-client-5.3.0.jar -DgroupId=io.confluent -DartifactId=kafka-schema-registry-client -Dversion=5.3.0 -Dpackaging=jar

[INFO] Reactor Summary for Hudi 0.10.0:
[INFO] 
[INFO] Hudi ............................................... SUCCESS [  1.370 s]
[INFO] hudi-common ........................................ SUCCESS [ 10.813 s]
[INFO] hudi-aws ........................................... SUCCESS [  1.394 s]
[INFO] hudi-timeline-service .............................. SUCCESS [  1.404 s]
[INFO] hudi-client ........................................ SUCCESS [  0.072 s]
[INFO] hudi-client-common ................................. SUCCESS [  7.295 s]
[INFO] hudi-hadoop-mr ..................................... SUCCESS [  2.848 s]
[INFO] hudi-spark-client .................................. SUCCESS [ 15.158 s]
[INFO] hudi-sync-common ................................... SUCCESS [  0.681 s]
[INFO] hudi-hive-sync ..................................... SUCCESS [  2.856 s]
[INFO] hudi-spark-datasource .............................. SUCCESS [  0.054 s]
[INFO] hudi-spark-common_2.12 ............................. SUCCESS [  7.296 s]
[INFO] hudi-spark2_2.12 ................................... SUCCESS [ 10.521 s]
[INFO] hudi-spark_2.12 .................................... SUCCESS [ 26.299 s]
[INFO] hudi-utilities_2.12 ................................ SUCCESS [ 11.262 s]
[INFO] hudi-utilities-bundle_2.12 ......................... SUCCESS [01:39 min]
[INFO] hudi-cli ........................................... SUCCESS [ 15.297 s]
[INFO] hudi-java-client ................................... SUCCESS [  2.267 s]
[INFO] hudi-flink-client .................................. SUCCESS [01:06 min]
[INFO] hudi-spark3_2.12 ................................... SUCCESS [  6.117 s]
[INFO] hudi-dla-sync ...................................... SUCCESS [  6.830 s]
[INFO] hudi-sync .......................................... SUCCESS [  0.061 s]
[INFO] hudi-hadoop-mr-bundle .............................. SUCCESS [  8.565 s]
[INFO] hudi-hive-sync-bundle .............................. SUCCESS [  1.131 s]
[INFO] hudi-spark-bundle_2.12 ............................. SUCCESS [ 11.139 s]
[INFO] hudi-presto-bundle ................................. SUCCESS [ 38.706 s]
[INFO] hudi-timeline-server-bundle ........................ SUCCESS [  8.251 s]
[INFO] hudi-hadoop-docker ................................. SUCCESS [  1.166 s]
[INFO] hudi-hadoop-base-docker ............................ SUCCESS [  0.649 s]
[INFO] hudi-hadoop-namenode-docker ........................ SUCCESS [  0.649 s]
[INFO] hudi-hadoop-datanode-docker ........................ SUCCESS [  0.627 s]
[INFO] hudi-hadoop-history-docker ......................... SUCCESS [  0.659 s]
[INFO] hudi-hadoop-hive-docker ............................ SUCCESS [  7.320 s]
[INFO] hudi-hadoop-sparkbase-docker ....................... SUCCESS [  0.731 s]
[INFO] hudi-hadoop-sparkmaster-docker ..................... SUCCESS [  0.638 s]
[INFO] hudi-hadoop-sparkworker-docker ..................... SUCCESS [  0.667 s]
[INFO] hudi-hadoop-sparkadhoc-docker ...................... SUCCESS [  0.671 s]
[INFO] hudi-hadoop-presto-docker .......................... SUCCESS [  0.704 s]
[INFO] hudi-integ-test .................................... SUCCESS [ 36.320 s]
[INFO] hudi-integ-test-bundle ............................. SUCCESS [01:47 min]
[INFO] hudi-examples ...................................... SUCCESS [  8.120 s]
[INFO] hudi-flink_2.12 .................................... SUCCESS [ 38.207 s]
[INFO] hudi-kafka-connect ................................. SUCCESS [ 19.832 s]
[INFO] hudi-flink-bundle_2.12 ............................. SUCCESS [ 27.658 s]
[INFO] hudi-kafka-connect-bundle .......................... SUCCESS [ 14.287 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  10:30 min
[INFO] Finished at: 2022-02-06T20:29:29+08:00
[INFO] ------------------------------------------------------------------------
[root@node01 hudi-0.10.0]# 

3. 编译的包目录

[root@node01 packaging]# pwd
/opt/module/hudi/hudi-0.10.0/packaging
[root@node01 packaging]# ll
总用量 4
drwxr-xr-x 4 501 games   46 2月   6 20:41 hudi-flink-bundle
drwxr-xr-x 4 501 games   46 2月   6 20:38 hudi-hadoop-mr-bundle
drwxr-xr-x 4 501 games   46 2月   6 20:38 hudi-hive-sync-bundle
drwxr-xr-x 4 501 games   46 2月   6 20:39 hudi-integ-test-bundle
drwxr-xr-x 4 501 games   46 2月   6 20:41 hudi-kafka-connect-bundle
drwxr-xr-x 4 501 games   46 2月   6 20:38 hudi-presto-bundle
drwxr-xr-x 4 501 games   46 2月   6 20:38 hudi-spark-bundle
drwxr-xr-x 4 501 games  101 2月   6 20:38 hudi-timeline-server-bundle
drwxr-xr-x 4 501 games   46 2月   6 20:37 hudi-utilities-bundle
-rw-r--r-- 1 501 games 2206 12月  8 10:38 README.md
[root@node01 packaging]# 

4.flink 整合hudi 所需要的jar 包

主要是
hudi-flink-bundle_2.12-0.10.0.jar
hudi-hadoop-mr-bundle-0.10.0.jar

[root@node01 lib]# pwd
/opt/module/flink/flink-1.13.5/lib
[root@node01 lib]# ll
总用量 316964
-rw-r--r-- 1 root root   7802399 1月   1 08:27 doris-flink-1.0-SNAPSHOT.jar
-rw-r--r-- 1 root root    249571 12月 27 23:32 flink-connector-jdbc_2.12-1.13.5.jar
-rw-r--r-- 1 root root    359138 1月   1 10:17 flink-connector-kafka_2.12-1.13.5.jar
-rw-r--r-- 1 hive 1007     92315 12月 15 08:23 flink-csv-1.13.5.jar
-rw-r--r-- 1 hive 1007 106535830 12月 15 08:29 flink-dist_2.12-1.13.5.jar
-rw-r--r-- 1 hive 1007    148127 12月 15 08:23 flink-json-1.13.5.jar
-rw-r--r-- 1 root root  43317025 2月   6 20:51 flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
-rw-r--r-- 1 hive 1007   7709740 12月 15 06:57 flink-shaded-zookeeper-3.4.14.jar
-rw-r--r-- 1 hive 1007  35051557 12月 15 08:28 flink-table_2.12-1.13.5.jar
-rw-r--r-- 1 hive 1007  38613344 12月 15 08:28 flink-table-blink_2.12-1.13.5.jar
-rw-r--r-- 1 root root  62447468 2月   6 20:44 hudi-flink-bundle_2.12-0.10.0.jar
-rw-r--r-- 1 root root  17276348 2月   6 20:51 hudi-hadoop-mr-bundle-0.10.0.jar
-rw-r--r-- 1 root root   1893564 1月   1 10:17 kafka-clients-2.0.0.jar
-rw-r--r-- 1 hive 1007    207909 12月 15 06:56 log4j-1.2-api-2.16.0.jar
-rw-r--r-- 1 hive 1007    301892 12月 15 06:56 log4j-api-2.16.0.jar
-rw-r--r-- 1 hive 1007   1789565 12月 15 06:56 log4j-core-2.16.0.jar
-rw-r--r-- 1 hive 1007     24258 12月 15 06:56 log4j-slf4j-impl-2.16.0.jar
-rw-r--r-- 1 root root    724213 12月 27 23:23 mysql-connector-java-5.1.9.jar
[root@node01 lib]# 

5. 进入到flink sql 中

./sql-client.sh embedded shell
# 在SQL Cli设置分析结果展示模式
set execution.result-mode=tableau;

6. 建表语句

    CREATE TABLE t1(
      uuid VARCHAR(20), 
      name VARCHAR(10),
      age INT,
      ts TIMESTAMP(3),
      `partition` VARCHAR(20)
    )
    PARTITIONED BY (`partition`)
    WITH (
      'connector' = 'hudi',
      'path' = 'hdfs://192.168.1.161:8020/hudi-warehouse/hudi-t1',
      'write.tasks' = '1',
      'compaction.tasks' = '1', 
      'table.type' = 'MERGE_ON_READ'
    );

6.1 插入数据

INSERT INTO t1 VALUES('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1');
INSERT INTO t1 VALUES
('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
##  展示
Flink SQL> INSERT INTO t1 VALUES
> ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
> ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
> ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
> ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
> ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
> ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
> ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
[INFO] Submitting SQL update statement to the cluster...
[INFO] SQL update statement has been successfully submitted to the cluster:
Job ID: d6c70e43969b0f2b5124104468c5e065


Flink SQL> select * from t1;
+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
| op |                           uuid |                           name |         age |                      ts |                      partition |
+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
| +I |                            id6 |                           Emma |          20 | 1970-01-01 00:00:06.000 |                           par3 |
| +I |                            id5 |                         Sophia |          18 | 1970-01-01 00:00:05.000 |                           par3 |
| +I |                            id8 |                            Han |          56 | 1970-01-01 00:00:08.000 |                           par4 |
| +I |                            id7 |                            Bob |          44 | 1970-01-01 00:00:07.000 |                           par4 |
| +I |                            id2 |                        Stephen |          33 | 1970-01-01 00:00:02.000 |                           par1 |
| +I |                            id1 |                          Danny |          28 | 1970-01-01 00:00:01.000 |                           par1 |
| +I |                            id4 |                         Fabian |          31 | 1970-01-01 00:00:04.000 |                           par2 |
| +I |                            id3 |                         Julian |          53 | 1970-01-01 00:00:03.000 |                           par2 |
+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
Received a total of 8 rows



7 更新 操作 更新就是需要从新插入数据

将年龄更改为18
INSERT INTO t1 VALUES('id1','Danny',18,TIMESTAMP '1970-01-01 00:00:01','par1');
查询如下

Flink SQL> select * from t1;
+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
| op |                           uuid |                           name |         age |                      ts |                      partition |
+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
| +I |                            id8 |                            Han |          56 | 1970-01-01 00:00:08.000 |                           par4 |
| +I |                            id7 |                            Bob |          44 | 1970-01-01 00:00:07.000 |                           par4 |
| +I |                            id4 |                         Fabian |          31 | 1970-01-01 00:00:04.000 |                           par2 |
| +I |                            id3 |                         Julian |          53 | 1970-01-01 00:00:03.000 |                           par2 |
| +I |                            id2 |                        Stephen |          33 | 1970-01-01 00:00:02.000 |                           par1 |
| +I |                            id1 |                          Danny |          18 | 1970-01-01 00:00:01.000 |                           par1 |
| +I |                            id6 |                           Emma |          20 | 1970-01-01 00:00:06.000 |                           par3 |
| +I |                            id5 |                         Sophia |          18 | 1970-01-01 00:00:05.000 |                           par3 |
+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
Received a total of 8 rows

Flink SQL> 

8.flink 中的任务

在这里插入图片描述
上一篇下一篇

猜你喜欢

热点阅读