分布式日志框架

2018-07-24 本文已影响46人偶像本人

一、背景

由于我们是微服务，日志散落在不同的容器里面，一个完整的调用链可能会涉及到多个容器，查看日志比较不方便。并且日志分了主从，需要在不同的机器上查看日志。这样子对于我们排查故障以及找到故障原因，就需要耗费比较多的时间。
业界通用的日志数据管理解决方案是ELK，其中分别代表Elasticsearch 、 Logstash 和 Kibana 三个系统。但我们使用的是EFK，分别是Elasticsearch 、Fluent-bit/Fluentd和 Kibana 。

Kibana ：可视化化平台。它能够搜索、展示存储在 Elasticsearch 中索引数据。使用它可以很方便的用图表、表格、地图展示和分析数据。
Elasticsearch ：分布式搜索引擎。具有高可伸缩、高可靠、易管理等特点。可以用于全文检索、结构化检索和分析，并能将这三者结合起来。Elasticsearch 基于 Lucene 开发，现在使用最广的开源搜索引擎之一，Wikipedia 、StackOverflow、Github 等都基于它来构建自己的搜索引擎。
Fluent-bit：Fluent bit是一个用c写成的插件式、轻量级、多平台开源日志收集工具。它允许从不同的源收集数据并发送到多个目的地。完全兼容docker和kubernetes生态环境。https://fluentbit.io/
Fluentd:

分布式日志框架.png

二、Log forwarder：Fluent-bit(https://fluentbit.io/documentation/0.13/filter/record_modifier.html)

image.png

我们将fluent-bit 内嵌在容器中

DockerFile.png

startup脚本.png

fluent-bit.conf

[SERVICE]
    Flush        5
    Daemon       On
    Log_Level    error
    Log_File     /fluent-bit/log/fluent-bit.log
    Parsers_File parse_dapeng.conf

[INPUT]
    Name tail
    Path /dapeng-container/logs/*.log
    Tag  dapeng
    Multiline  on 
    Buffer_Chunk_Size 5m
    buffer_max_size  20m
    Parser_Firstline dapeng_multiline
    db  /fluent-bit/db/logs.db

[FILTER]
    Name record_modifier
    Match *
    Record hostname ${soa_container_ip}
    Record tag ${serviceName}

[OUTPUT]    
    Name  Forward
    Match *
    Host  fluentd
    Port  24224
    HostStandby fluentdStandby
    PortStandby 24224

parse_dapeng.conf

# cat parsers_dapeng.conf
[PARSER]
    Name        dapeng_multiline
    Format      regex
    Regex       (?<logtime>\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2} \d{1,3}) (?<threadPool>.*) (?<level>.*) \[(?<sessionTid>.*)\] - (?<message>.*)

三、Log aggregators：Fluentd(https://docs.fluentd.org/v1.0/articles/config-file)

fluentd.conf

<system>
  log_level error                          #日志对应级别
  workers 8                                 #指定工作线程数
</system>
<source>
  @type forward                        #用于接收fluent-bi发送过来的日志
   port 24224
</source>
<match dapeng tomcat>       #匹配fluent-bit发送过来tag为dapeng和tomcat的日志
  @type elasticsearch
  host 121.40.135.7
  port 9200                               #发送es对应的地址
  index_name dapeng_log_index       #es对应的索引名，如果配置了logstash可忽略该配置以logstash中配置的为主
  type_name dapeng_log                   #同上
  content_type application/x-ndjson   #请求es方式
  buffer_type file                                 #缓冲块存储方式文件存储
  buffer_path /tmp/buffer_file             #缓冲块存储路径
  buffer_chunk_limit 15m                   #缓冲块大小
  buffer_queue_limit 512                   #缓冲块队列大小
  flush_mode interval                        #刷新缓冲区模式
  flush_interval 60s                           #刷新缓冲区间隔
  request_timeout 15s                      #超时时间设置
  flush_thread_count 8                     #刷新缓冲区线程数
  reload_on_failure true                   #针对es集群对于es下线的结点使之移除
  resurrect_after 30s                        #es拿连接失败重试频率
  reconnect_on_error true               #发生错误时是否重新连接
  with_transporter_log true             #调试选项
  logstash_format true                    #是否启用logstash格式
  logstash_prefix dapeng_log_index   #索引前缀
  template_name dapeng_log_index  #索引模板名称
  template_file /fluentd/etc/template.json # 索引模板文件
  num_threads 8                                #同flush_thread_count 旧版配置
  utc_index false                                #设置es写入为当前时间
</match>

template.json

{
  "template":"fluentd-template",
  "mappings": {
    "dapeng_log": {
      "properties": {

        "logtime": {
          "type": "date",
          "format": "MM-dd HH:mm:ss SSS"
        },
        "threadPool": {
          "type": "string",
          "index": "not_analyzed"
        },
        "level": {
          "type": "string",
          "index": "not_analyzed"
        },
        "tag": {
          "type": "string",
          "index": "not_analyzed"
        },
        "message": {
          "type": "string",
          "index": "not_analyzed",
          "ignore_above":256
        },
        "hostname":{
          "type": "string",
          "index": "not_analyzed"
        },
        "sessionTid":{
          "type": "string",
          "index": "not_analyzed"
        },
        "log":{
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  },
  "settings": {
    "index": {
      "max_result_window": "100000000",
      "number_of_shards": "1",
      "number_of_replicas": "1"
    }
  },
  "warmers": {}
}

四、Elasticsearch (https://blog.csdn.net/cnweike/article/details/33736429)

现在我们已经有一个正常运行的节点（和集群）了，下一步就是要去理解怎样与其通信了。幸运的是，Elasticsearch提供了非常全面和强大的REST API，利用这个REST API你可以同你的集群交互。下面是利用这个API，可以做的几件事情：

检查你的集群、节点和索引的健康状态、和各种统计信息
管理你的集群、节点、索引数据和元数据
对你的索引进行CRUD（创建、读取、更新和删除）和搜索操作
执行高级的查询操作，像是分页、排序、过滤、脚本编写（scripting）、小平面刻画（faceting）、聚合（aggregations）和许多其它操作

API使用

查看集群状态 curl 'localhost:9200/_cat/health?v'
获得节集群中的节点列表 curl 'localhost:9200/_cat/nodes?v'
查询索引 curl 'localhost:9200/_cat/indices?v'
创建索引 curl -XPUT 'localhost:9200/customer?pretty'
将一个简单的客户文档索引到customer索引、“external”类型中
curl -XPUT 'localhost:9200/customer/external/1?pretty' -d '
{
"name": "John Doe"
}'
查询刚刚索引的文档 curl -XGET 'localhost:9200/customer/external/1?pretty'
删除索引 curl -XDELETE 'localhost:9200/customer?pretty'
更新文档
curl -XPOST 'localhost:9200/customer/external/1/_update?pretty' -d '
{
"doc": { "name": "Jane Doe", "age": 20 }
}'

五、SessionTid

由服务发起者创建的全局唯一id, 通过InvocationContext传递,用于跟踪一次完整的服务调用过程. 当SessionTid为0时，服务实现端会使用当前创建的Tid作为sessionTid

<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
        <layout class="ch.qos.logback.classic.PatternLayout">
            <Pattern>%d{MM-dd HH:mm:ss SSS} %t %p [%X{sessionTid}] - %m%n</Pattern>
        </layout>
        <!-- @see http://logback.qos.ch/manual/filters.html#levelFilter -->
        <filter class="ch.qos.logback.classic.filter.ThresholdFilter">
            <level>DEBUG</level>
        </filter>
    </appender>

image.png

logout描述：

%C{length}、%class{length}：输出发生日志事件的调用类的全限定名。与%logger类似，{length}可选项来限定类名的长度，适度进行缩写。
%d{pattern}、%date{pattern}、%d{pattern,timezone}、%date{pattern,timezone}：输出日志事件的时间;{pattern}为可选项，用于声明时间的格式，比如%d{yyyy-MM-dd HH:mm:ss}，pattern必须为“java.text.SimpleDateFormat”类可兼容的格式。
%F、%file：输出发生日志请求的java源文件名，产生文件名信息不是特别的快，有一定的性能损耗，除非对执行速度不敏感否则应该避免使用此选项。(比如输出：TestMain.java，默认异常栈中会输出类名)
%caller{depth}、%caller{depthStart..depthEnd}：输出产生日志事件的调用者位置信息，{depth}为可选项；位置信息依赖于JVM实现，不过通常会包含调用方法的全限定名、文件名和行号。
%L、%line：输出发生日志请求的源文件行号，产生行号信息不是非常的快速，有一定的性能损耗，除非对执行速度不敏感否则应该避免使用此选项。（默认异常栈中会输出行号）
%m、%msg、%message：在日志中输出应用提供的message。

比如：LOGGER.error("message",exception)，输出“message”和exception栈。
%M、%method：输出发出日志记录请求的方法名称，产生方法名不是特别快速。
%n：输出一个行分隔符，即换行符。（取决于运行平台，可能是“\n”,"\r\n"）
%p、%le、%level：输出日志事件的level。
%t、%thread：输出产生日志事件的线程名称。
%ex{depth}、%exception{depth}：输出日志事件相关的异常栈，默认会输出异常的全跟踪栈。（%m会包含此部分）
%nopex：输出日志数据，但是忽略exception。

logback内置的日志字段还是比较少，如果我们需要打印有关业务的更多的内容，包括自定义的一些数据，需要借助logback MDC机制，MDC为“Mapped Diagnostic Context”（映射诊断上下文），即将一些运行时的上下文数据通过logback打印出来；此时我们需要借助org.sl4j.MDC类。

MDC类基本原理其实非常简单，其内部持有一个InheritableThreadLocal实例，用于保存context数据，MDC提供了put/get/clear等几个核心接口，用于操作ThreadLocal中的数据；ThreadLocal中的K-V，可以在logback.xml中声明，最终将会打印在日志中。

MDC.put("userId",1000);

那么在logback.xml中，即可在layout中通过声明“%X{userId}”来打印此信息。