hive mapjoin MapJoinMemoryExhaus
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"aid":252511110,"property":"{\"aid\":252511110,\"alvl\":0,\"avn\":0,\"avdn\":0,\"avpn\":0,\"avcn\":0,\"avsn\":0,\"avti\":0,\"avtp\":0}","dt":"20190226"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)
Caused by: org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 2019-02-28 04:08:29 Processing rows: 200000 Hashtable size: 199999 Memory usage: 7528720336 percentage: 0.60
at org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(MapJoinMemoryExhaustionHandler.java:99)
at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.processOp(HashTableSinkOperator.java:249)
at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.processOp(SparkHashTableSinkOperator.java:79)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:97)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
... 17 more
19/02/28 04:08:36 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown
原因:
MapJoinMemoryExhaustionHandler.java 中有如下代码
public void checkMemoryStatus(long tableContainerSize, long numRows)
throws MapJoinMemoryExhaustionException {
long usedMemory = memoryMXBean.getHeapMemoryUsage().getUsed();
double percentage = (double) usedMemory / (double) maxHeapSize;
String msg = Utilities.now() + "\tProcessing rows:\t" + numRows + "\tHashtable size:\t"
+ tableContainerSize + "\tMemory usage:\t" + usedMemory + "\tpercentage:\t" + percentageNumberFormat.format(percentage);
console.printInfo(msg);
if(percentage > maxMemoryUsage) {
throw new MapJoinMemoryExhaustionException(msg);
}
}
最终解决办法:
hive在 0.11后 hive.auto.convert.join 自动为true,也就是如果满足条件会自动去做mapjoin, mapjoin的参数判断见
https://yq.aliyun.com/articles/64306
参考:
hive-15221: https://issues.apache.org/jira/browse/HIVE-15221?spm=a2c4e.11153940.blogcont64306.15.eeb3541edXrhsd
https://yq.aliyun.com/articles/64306
https://stackoverflow.com/questions/22977790/hive-query-execution-error-return-code-3-from-mapredlocaltask?spm=a2c4e.11153940.blogcont64306.13.eeb3541edXrhsd
https://yq.aliyun.com/articles/476771?spm=a2c4e.11153940.blogcont64306.28.eeb3541edXrhsd