记一个Datax现存问题

2020-09-25  本文已影响0人  诺之林

目录

复现

docker run --name datax-issue -p 27017:27017 -d mongo:4.0.4

docker exec -it datax-issue /bin/bash

mongo
use datax

db.getCollection("gps").insert( {
    longitude: 34.9016151428223,
    latitude: NaN
} );
wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz

tar xf datax.tar.gz && cd datax

vim job/job.json
{
    "job": {
        "setting": {
            "speed": {
                "channel": 1
            }
        },
        "content": [
            {
                "reader": {
                    "name": "mongodbreader",
                    "parameter": {
                        "address": [
                            "127.0.0.1:27017"
                        ],
                        "dbName": "datax",
                        "collectionName": "gps",
                        "column": [
                            {
                                "name": "longitude",
                                "type": "double"
                            },
                            {
                                "name": "latitude",
                                "type": "double"
                            }
                        ]
                    }
                },
                "writer": {
                    "name": "streamwriter",
                    "parameter": {
                        "print": true
                    }
                }
            }
        ]
    }
}
python bin/datax.py job/job.json
com.alibaba.datax.common.exception.DataXException: Code:[Framework-13], Description:[DataX插件运行时出错, 具体原因请参看DataX运行结束时的错误诊断信息 .].  - java.lang.NumberFormatException
    at java.math.BigDecimal.<init>(BigDecimal.java:497)
    at java.math.BigDecimal.<init>(BigDecimal.java:383)
    at java.math.BigDecimal.<init>(BigDecimal.java:809)
    at com.alibaba.datax.common.element.DoubleColumn.<init>(DoubleColumn.java:30)
    at com.alibaba.datax.plugin.reader.mongodbreader.MongoDBReader$Task.startRead(MongoDBReader.java:128)
    at com.alibaba.datax.core.taskgroup.runner.ReaderRunner.run(ReaderRunner.java:57)
    at java.lang.Thread.run(Thread.java:748)

解决

git clone https://github.com/alibaba/DataX.git && cd DataX

vim mongodbreader/src/main/java/com/alibaba/datax/plugin/reader/mongodbreader/MongoDBReader.java
//TODO deal with Double.isNaN()
if(Double.isNaN((Double) tempCol)) {
    record.addColumn(new StringColumn(null));
} else {
    record.addColumn(new DoubleColumn((Double) tempCol));
}
mvn -U clean package assembly:assembly -Dmaven.test.skip=true
python bin/datax.py job/job.json
34.9016151428223    null
任务启动时刻                    : 2020-09-25 09:52:40
任务结束时刻                    : 2020-09-25 09:52:51
任务总计耗时                    :                 10s
任务平均流量                    :                1B/s
记录写入速度                    :              0rec/s
读出记录总数                    :                   1
读写失败总数                    :                   0

原理

/Library/Java/JavaVirtualMachines/jdk-11.0.8.jdk/Contents/Home/bin/jshell
|  Welcome to JShell -- Version 11.0.8
|  For an introduction type: /help intro

jshell> double ZERO = 0;
ZERO ==> 0.0

jshell> ZERO / ZERO;
$2 ==> NaN

jshell> Math.sqrt(-1);
$3 ==> NaN

jshell> /exit
|  Goodbye

增补

// mongodbreader/src/main/java/com/alibaba/datax/plugin/reader/mongodbreader/util/CollectionSplitUtil.java
-       int docCount = result.getInteger("count");
+       int docCount = result.getDouble("count").intValue();
// channel 配置成1 根本原因待定位
"channel": 1
sudo apt install -y screen

screen -S ots

cd ~/datax

python bin/datax.py job/local2ots.json // 接着关闭客户端即可

screen -r ots // 此时可以恢复终端任务

参考

上一篇 下一篇

猜你喜欢

热点阅读