玩转大数据Java

Hudi Flink hive-exec依赖冲突问题

2025-10-23  本文已影响0人  AlienPaul

环境信息

Flink 需要使用Hudi和Hive,lib目录添加了hive-exec-3.1.0.jarhudi-flink1.17-bundle-0.15.0.jar依赖。

错误日志

Flink SQL Client向Hudi COW表插入数据的时候出现如下错误。任务不停的重启。

java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$PrimitiveBuilder.as(Lorg/apache/parquet/schema/LogicalTypeAnnotation;)Lorg/apache/parquet/schema/Types$Builder;
    at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:184)
    at org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:256)
    at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:213)
    at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:159)
    at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:275)
    at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:153)
    at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:144)
    at org.apache.hudi.io.hadoop.HoodieAvroFileWriterFactory.getHoodieAvroWriteSupport(HoodieAvroFileWriterFactory.java:129)
    at org.apache.hudi.io.hadoop.HoodieAvroFileWriterFactory.newParquetFileWriter(HoodieAvroFileWriterFactory.java:67)
    at org.apache.hudi.io.storage.HoodieFileWriterFactory.getFileWriterByFormat(HoodieFileWriterFactory.java:67)
    at org.apache.hudi.io.storage.HoodieFileWriterFactory.getFileWriter(HoodieFileWriterFactory.java:53)
    at org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:108)
    at org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:84)
    at org.apache.hudi.io.FlinkCreateHandle.<init>(FlinkCreateHandle.java:66)
    at org.apache.hudi.io.FlinkCreateHandle.<init>(FlinkCreateHandle.java:59)
    at org.apache.hudi.io.FlinkWriteHandleFactory$BaseCommitWriteHandleFactory.create(FlinkWriteHandleFactory.java:121)
    at org.apache.hudi.client.HoodieFlinkWriteClient.getOrCreateWriteHandle(HoodieFlinkWriteClient.java:459)
    at org.apache.hudi.client.HoodieFlinkWriteClient.access$000(HoodieFlinkWriteClient.java:77)
    at org.apache.hudi.client.HoodieFlinkWriteClient$AutoCloseableWriteHandle.<init>(HoodieFlinkWriteClient.java:515)
    at org.apache.hudi.client.HoodieFlinkWriteClient$AutoCloseableWriteHandle.<init>(HoodieFlinkWriteClient.java:507)
    at org.apache.hudi.client.HoodieFlinkWriteClient.upsert(HoodieFlinkWriteClient.java:148)
    at org.apache.hudi.sink.StreamWriteFunction.lambda$initWriteFunction$1(StreamWriteFunction.java:192)
    at org.apache.hudi.sink.StreamWriteFunction.writeBucket(StreamWriteFunction.java:495)
    at org.apache.hudi.sink.StreamWriteFunction.lambda$flushRemaining$7(StreamWriteFunction.java:467)
    at java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:608)
    at org.apache.hudi.sink.StreamWriteFunction.flushRemaining(StreamWriteFunction.java:463)
    at org.apache.hudi.sink.StreamWriteFunction.endInput(StreamWriteFunction.java:157)
    at org.apache.hudi.sink.common.AbstractWriteOperator.endInput(AbstractWriteOperator.java:48)
    at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.endOperatorInput(StreamOperatorWrapper.java:96)
    at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.endInput(RegularOperatorChain.java:97)
    at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:68)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:550)
    at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:839)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:788)
    at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952)
    at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931)
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
    at java.lang.Thread.run(Thread.java:750)

问题分析和解决

经过依赖分析发现是Hudi和hive-exec依赖的parquet-hadoop-bundle版本不同,hive-exec中的版本较老。Flink启动的时候hive-exec比Hudi的依赖先加载导致问题出现。

解决方案为通过人工干预使Flink优先加载Hudi bundle依赖。一个较为简单的方式是重命名Hudi bundle jar包名称使其字母顺序在hive-exec之前。例如重命名hudi-flink1.17-bundle-0.15.0.jarahudi-flink1.17-bundle-0.15.0.jar

另一种解决方案

修改packaging/hudi-flink-bundle/pom.xml,在relocations标签中加入:

<relocation>
  <pattern>org.apache.parquet</pattern>
  <shadedPattern>${flink.bundle.shade.prefix}org.apache.parquet</shadedPattern>
</relocation>

然后重新编译。

参考链接:
https://github.com/apache/hudi/issues/3042

参考资料

https://segmentfault.com/a/1190000045284194
https://blog.csdn.net/m0_66705151/article/details/125781898
https://github.com/apache/hudi/issues/3042

上一篇 下一篇

猜你喜欢

热点阅读