记一个Datax现存问题
2020-09-25 本文已影响0人
诺之林
目录
复现
docker run --name datax-issue -p 27017:27017 -d mongo:4.0.4
docker exec -it datax-issue /bin/bash
mongo
use datax
db.getCollection("gps").insert( {
longitude: 34.9016151428223,
latitude: NaN
} );
wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
tar xf datax.tar.gz && cd datax
vim job/job.json
{
"job": {
"setting": {
"speed": {
"channel": 1
}
},
"content": [
{
"reader": {
"name": "mongodbreader",
"parameter": {
"address": [
"127.0.0.1:27017"
],
"dbName": "datax",
"collectionName": "gps",
"column": [
{
"name": "longitude",
"type": "double"
},
{
"name": "latitude",
"type": "double"
}
]
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"print": true
}
}
}
]
}
}
python bin/datax.py job/job.json
com.alibaba.datax.common.exception.DataXException: Code:[Framework-13], Description:[DataX插件运行时出错, 具体原因请参看DataX运行结束时的错误诊断信息 .]. - java.lang.NumberFormatException
at java.math.BigDecimal.<init>(BigDecimal.java:497)
at java.math.BigDecimal.<init>(BigDecimal.java:383)
at java.math.BigDecimal.<init>(BigDecimal.java:809)
at com.alibaba.datax.common.element.DoubleColumn.<init>(DoubleColumn.java:30)
at com.alibaba.datax.plugin.reader.mongodbreader.MongoDBReader$Task.startRead(MongoDBReader.java:128)
at com.alibaba.datax.core.taskgroup.runner.ReaderRunner.run(ReaderRunner.java:57)
at java.lang.Thread.run(Thread.java:748)
解决
git clone https://github.com/alibaba/DataX.git && cd DataX
vim mongodbreader/src/main/java/com/alibaba/datax/plugin/reader/mongodbreader/MongoDBReader.java
//TODO deal with Double.isNaN()
if(Double.isNaN((Double) tempCol)) {
record.addColumn(new StringColumn(null));
} else {
record.addColumn(new DoubleColumn((Double) tempCol));
}
mvn -U clean package assembly:assembly -Dmaven.test.skip=true
- 然后将这里编译生成的mongodbreader-0.0.1-SNAPSHOT.jar替换下载包中相应*.jar
python bin/datax.py job/job.json
34.9016151428223 null
任务启动时刻 : 2020-09-25 09:52:40
任务结束时刻 : 2020-09-25 09:52:51
任务总计耗时 : 10s
任务平均流量 : 1B/s
记录写入速度 : 0rec/s
读出记录总数 : 1
读写失败总数 : 0
原理
- JShell (Java 9 REPL Read Eval Print Loop) = Java 9新增的一个交互式的编程环境工具
/Library/Java/JavaVirtualMachines/jdk-11.0.8.jdk/Contents/Home/bin/jshell
| Welcome to JShell -- Version 11.0.8
| For an introduction type: /help intro
jshell> double ZERO = 0;
ZERO ==> 0.0
jshell> ZERO / ZERO;
$2 ==> NaN
jshell> Math.sqrt(-1);
$3 ==> NaN
jshell> /exit
| Goodbye
增补
- 问题1: Double无法转换成Integer错误
// mongodbreader/src/main/java/com/alibaba/datax/plugin/reader/mongodbreader/util/CollectionSplitUtil.java
- int docCount = result.getInteger("count");
+ int docCount = result.getDouble("count").intValue();
- 问题2: job执行时卡住不动且无报错
// channel 配置成1 根本原因待定位
"channel": 1
- 问题3: 保持服务器上任务持续执行
sudo apt install -y screen
screen -S ots
cd ~/datax
python bin/datax.py job/local2ots.json // 接着关闭客户端即可
screen -r ots // 此时可以恢复终端任务