logstash的webhdfs使用问题
2018-04-25 本文已影响30人
不正经运维
2018年4月25日 星期三
10:11
现象
Logstash使用webhdfs插件,配置完成后无法正常输出到HDFS中,日志中报错:
[2018-04-25T00:00:26,915][WARN ][logstash.outputs.webhdfs ] Failed to flush outgoing items {:outgoing_count=>1, :exception=>"WebHDFS::ServerError", :backtrace=>["/opt/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:351:in `request'", "/opt/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:270:in `operate_requests'", "/opt/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:73:in `create'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:228:in `write_data'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:211:in `block in flush'", "org/jruby/RubyHash.java:1343:in `each'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:199:in `flush'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:219:in `block in buffer_flush'", "org/jruby/RubyHash.java:1343:in `each'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:216:in `buffer_flush'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:159:in `buffer_receive'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:182:in `receive'", "/opt/logstash/logstash-core/lib/logstash/outputs/base.rb:92:in `block in multi_receive'", "org/jruby/RubyArray.java:1734:in `each'", "/opt/logstash/logstash-core/lib/logstash/outputs/base.rb:92:in `multi_receive'", "/opt/logstash/logstash-core/lib/logstash/output_delegator_strategies/legacy.rb:22:in `multi_receive'", "/opt/logstash/logstash-core/lib/logstash/output_delegator.rb:49:in `multi_receive'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:477:in `block in output_batch'", "org/jruby/RubyHash.java:1343:in `each'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:476:in `output_batch'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:428:in `worker_loop'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:386:in `block in start_workers'"]}
分析
检查配置
既然报这个错误,就确定是访问WebHDFS过程中出错,那么首先检查下配置。
配置内容如下:
input {
beats {
port => "5044"
}
}
output {
stdout{
codec => rubydebug
}
webhdfs {
host => "x.x.x.x"
port => 9870
path => "/weblog/iis/%{@source_host}/%{+YYYY-MM-dd}/iislog-%{@source_host}-%{YYYYMMddHH}.log"
user => "root"
retry_times => 100
}
}
WebHDFS: ServerError
因为没用过Logstash,但是直到很简单易用。所以直接搜关键字,查找到了logstash-output-webhdfs Failed to flush outgoing items这篇文章,提到:
It seems you should set user option of logstash-output-webhdfs to the hdfs supergroup user,which is the user you use to start hdfs.For example ,if you use root to run start-dfs.sh bash,then the user option shuold be root.
In addition, you should edit /etc/hosts ,add the hdfs cluster node route list .
可以确认两点常见问题:
- HDFS访问账户问题;
- HDFS的主机解析问题;
解决
HDFS访问账户问题
这个很容易确认,HDFS上使用的账户就是root。
HDFS主机解析问题
查看/etc/hosts
内容,发现只有namenode
做了配置。
简单思考了下,Logstash默认可能使用主机名进行解析的,而且从namenode
获取到的也应该是主机名。因此Answer中才说要加入节点路由列表。
增加hosts
直接将所有Hadoop的节点/IP映射放入/etc/hosts
中。
修改配置
然后修改logstash配置。
input {
beats {
port => "5044"
}
}
output {
stdout{
codec => rubydebug
}
webhdfs {
host => "namenode"
port => 9870
path => "/weblog/iis/%{+YYYY-MM-dd}/%{@source_host}/iislog-%{+HH}.log"
user => "root"
retry_times => 100
}
}
确认结果
查看HDFS中,如果有建立对应目录和文件就OK了。
遗留问题
实际上还存在一些问题:
- 按照官网示例,年月日前面有个
dt=
不明白什么作用。 -
%{@source_host}
无法解析。 -
{+HH}
不是按照UTC+0800来建立的。