Jaeger分布式跟踪工具初探
官方文档
Jaeger简介
Jaeger:开源的端到端分布式跟踪,监视复杂的分布式系统中的事务并进行故障排除。
下图对比了常用的开源全链路追踪方案,目前SkyWalking和Pinpoint使用比较多,Jaeger相比客户端支持语言比较多,特别是对C++的支持,所以这次选择测试下。
Jaeger解决的问题
- 分布式事务监控
- 性能和延迟优化
- 根本原因分析
- 服务依赖性分析
- 分布式上下文传播
Jaeger架构图
Jaeger组件
- Jaeger Agent,负责和客户端通信,把收集到的追踪信息上报个收集器 Jaeger Collector
- Jaeger Colletor把收集到的数据存入数据库或者其它存储器
- Jaeger Query 负责对追踪数据进行查询
- Jaeger Ingester 是一个从Kafka主题读取并写入另一个存储后端(Cassandra、Elasticsearch)的服务
- Jaeger UI负责用户交互
Jaeger端口统计
Agent
5775 UDP协议,接收兼容zipkin的协议数据
6831 UDP协议,接收兼容jaeger的兼容协议
6832 UDP协议,接收jaeger的二进制协议
5778 HTTP协议,数据量大不建议使用
Collector
14267 tcp agent发送jaeger.thrift格式数据
14250 tcp agent发送proto格式数据(背后gRPC)
14268 http 直接接受客户端数据
14269 http 健康检查
Query
16686 http jaeger的前端,放给用户的接口
16687 http 健康检查
Jaeger部署
1.创建命名空间
[root@VM-0-123-centos jaeger]# kubectl create namespace jaeger
2.部署Jaeger-Operator
Jaeger Operator:Jaeger Operator for Kubernetes简化了在Kubernetes上的部署和运行Jaeger。
Jaeger Operator是Kubernetes operator的实现。操作员是一种软件,可以减轻运行另一软件的操作复杂性。从技术上讲,操作员是打包,部署和管理Kubernetes应用程序的一种方法。
Jaeger Operator版本跟踪Jaeger组件(查询,收集器,代理)的一种版本。发行新版本的Jaeger组件时,将发行新版本的操作员,该操作员了解如何将先前版本的运行实例升级到新版本。
[root@VM-0-123-centos jaeger]# kubectl create -n jaeger -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/crds/jaegertracing.io_jaegers_crd.yaml
[root@VM-0-123-centos jaeger]# kubectl create -n jaeger -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/service_account.yaml
[root@VM-0-123-centos jaeger]# kubectl create -n jaeger -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role.yaml
[root@VM-0-123-centos jaeger]# kubectl create -n jaeger -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role_binding.yaml
[root@VM-0-123-centos jaeger]# kubectl create -n jaeger -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/operator.yaml
查看状态
[root@VM-0-123-centos jaeger]# kubectl get all -n jaeger
NAME READY STATUS RESTARTS AGE
pod/jaeger-operator-6ff67bdd4b-4nffk 1/1 Running 0 14d
pod/simple-prod-collector-59fc47bf5c-h26mq 0/1 Terminating 0 9d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/jaeger-operator-metrics ClusterIP 172.20.253.138 <none> 8383/TCP,8686/TCP 14d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/jaeger-operator 1/1 1 1 14d
NAME DESIRED CURRENT READY AGE
replicaset.apps/jaeger-operator-6ff67bdd4b 1 1 1 14d
3.创建jaeger实例
创建jaeger.yaml文件,配置ES集群及限制Deployment/simple-prod-collector容器的cpu和内存使用大小。最大数量可以起10个pod。
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: simple-prod
spec:
strategy: production
storage:
type: elasticsearch
options:
es:
server-urls: http://10.0.16.3:9200
index-prefix: zhjt
collector:
maxReplicas: 10
resources:
limits:
cpu: 500m
memory: 512Mi
[root@VM-0-123-centos jaeger]# kubectl apply -f jaeger.yaml -n jaeger
jaeger.jaegertracing.io/simple-prod created
列出jaeger对象
备注:貌似使用官网all in one的例子状态是正常的Running,这里状态虽然是Failed,但是不影响使用。
[root@VM-0-123-centos jaeger]# kubectl get jaegers -n jaeger
NAME STATUS VERSION STRATEGY STORAGE AGE
simple-prod Failed 1.22.0 production elasticsearch 9d
获取pod名字
[root@VM-0-123-centos jaeger]# kubectl get pods -l app.kubernetes.io/instance=simple-prod -n jaeger
NAME READY STATUS RESTARTS AGE
simple-prod-collector-59fc47bf5c-h26mq 1/1 Running 0 9d
simple-prod-query-85689b7bbd-g5jw9 2/2 Running 0 9d
获取pod日志
[root@VM-0-123-centos jaeger]# kubectl logs simple-prod-query-85689b7bbd-g5jw9 jaeger-agent -n jaeger
2021/04/28 04:55:34 maxprocs: Leaving GOMAXPROCS=4: CPU quota undefined
{"level":"info","ts":1619585734.2081811,"caller":"flags/service.go:117","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1619585734.2082183,"caller":"flags/service.go:123","msg":"Mounting expvar handler on admin server","route":"/debug/vars"}
{"level":"info","ts":1619585734.2083232,"caller":"flags/admin.go:105","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1619585734.2083883,"caller":"flags/admin.go:111","msg":"Starting admin HTTP server","http-addr":":14271"}
{"level":"info","ts":1619585734.2084124,"caller":"flags/admin.go:97","msg":"Admin server started","http.host-port":"[::]:14271","health-status":"unavailable"}
{"level":"info","ts":1619585734.2089527,"caller":"grpc/builder.go:70","msg":"Agent requested insecure grpc connection to collector(s)"}
{"level":"info","ts":1619585734.2089992,"caller":"grpc@v1.29.1/clientconn.go:243","msg":"parsed scheme: \"dns\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.21038,"caller":"command-line-arguments/main.go:84","msg":"Starting agent"}
{"level":"info","ts":1619585734.2104166,"caller":"healthcheck/handler.go:128","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1619585734.2108943,"caller":"grpc/builder.go:108","msg":"Checking connection to collector"}
{"level":"info","ts":1619585734.210908,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"IDLE"}
{"level":"info","ts":1619585734.211061,"caller":"app/agent.go:69","msg":"Starting jaeger-agent HTTP server","http-port":5778}
{"level":"info","ts":1619585734.3344934,"caller":"grpc@v1.29.1/resolver_conn_wrapper.go:143","msg":"ccResolverWrapper: sending update to cc: {[{172.20.0.88:14250 <nil> 0 <nil>}] <nil> <nil>}","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3345578,"caller":"grpc@v1.29.1/clientconn.go:667","msg":"ClientConn switching balancer to \"round_robin\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3345697,"caller":"grpc@v1.29.1/clientconn.go:682","msg":"Channel switches to new LB policy \"round_robin\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3346283,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.33467,"caller":"grpc@v1.29.1/clientconn.go:1193","msg":"Subchannel picks a new address \"172.20.0.88:14250\" to connect","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.334736,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3347983,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"CONNECTING"}
{"level":"info","ts":1619585734.335669,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3357751,"caller":"base/balancer.go:200","msg":"roundrobinPicker: newPicker called with info: {map[0xc0002f5ea0:{{172.20.0.88:14250 <nil> 0 <nil>}}]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3357947,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.335807,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"READY"}
{"level":"info","ts":1619592172.4516647,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4517512,"caller":"grpc@v1.29.1/clientconn.go:1193","msg":"Subchannel picks a new address \"172.20.0.88:14250\" to connect","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4517596,"caller":"base/balancer.go:200","msg":"roundrobinPicker: newPicker called with info: {map[]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4517772,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4517884,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"CONNECTING"}
{"level":"warn","ts":1619592172.4523218,"caller":"grpc@v1.29.1/clientconn.go:1275","msg":"grpc: addrConn.createTransport failed to connect to {172.20.0.88:14250 <nil> 0 <nil>}. Err: connection error: desc = \"transport: Error while dialing dial tcp 172.20.0.88:14250: connect: connection refused\". Reconnecting...","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4523551,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to TRANSIENT_FAILURE","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.452386,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to TRANSIENT_FAILURE","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4523947,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"TRANSIENT_FAILURE"}
{"level":"info","ts":1619592172.6118224,"caller":"grpc@v1.29.1/resolver_conn_wrapper.go:143","msg":"ccResolverWrapper: sending update to cc: {[{172.20.0.178:14250 <nil> 0 <nil>}] <nil> <nil>}","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6118581,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6118758,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to SHUTDOWN","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.611892,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6119003,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"CONNECTING"}
{"level":"info","ts":1619592172.6119049,"caller":"grpc@v1.29.1/clientconn.go:1193","msg":"Subchannel picks a new address \"172.20.0.178:14250\" to connect","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.612726,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6127572,"caller":"base/balancer.go:200","msg":"roundrobinPicker: newPicker called with info: {map[0xc0003df970:{{172.20.0.178:14250 <nil> 0 <nil>}}]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6127682,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6127849,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"READY"}
[root@VM-0-123-centos jaeger]# kubectl logs simple-prod-query-85689b7bbd-g5jw9 jaeger-query -n jaeger
2021/04/28 04:55:29 maxprocs: Leaving GOMAXPROCS=4: CPU quota undefined
{"level":"info","ts":1619585729.8951077,"caller":"flags/service.go:117","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1619585729.8951416,"caller":"flags/service.go:123","msg":"Mounting expvar handler on admin server","route":"/debug/vars"}
{"level":"info","ts":1619585729.8952546,"caller":"flags/admin.go:105","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1619585729.8953054,"caller":"flags/admin.go:111","msg":"Starting admin HTTP server","http-addr":":16687"}
{"level":"info","ts":1619585729.8953238,"caller":"flags/admin.go:97","msg":"Admin server started","http.host-port":"[::]:16687","health-status":"unavailable"}
{"level":"info","ts":1619585729.9169888,"caller":"config/config.go:183","msg":"Elasticsearch detected","version":7}
{"level":"info","ts":1619585729.9174955,"caller":"app/static_handler.go:181","msg":"UI config path not provided, config file will not be watched"}
{"level":"info","ts":1619585729.9175768,"caller":"app/server.go:170","msg":"Query server started"}
{"level":"info","ts":1619585729.9175944,"caller":"healthcheck/handler.go:128","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1619585729.9176183,"caller":"app/server.go:249","msg":"Starting GRPC server","port":16685,"addr":":16685"}
{"level":"info","ts":1619585729.9176335,"caller":"app/server.go:230","msg":"Starting HTTP server","port":16686,"addr":":16686"}
4.查看jaeger资源
[root@VM-0-123-centos jaeger]# kubectl get all -n jaeger
NAME READY STATUS RESTARTS AGE
pod/jaeger-operator-6ff67bdd4b-4nffk 1/1 Running 0 14d
pod/simple-prod-collector-59fc47bf5c-h26mq 1/1 Running 0 8d
pod/simple-prod-query-85689b7bbd-g5jw9 2/2 Running 0 8d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/jaeger-operator-metrics ClusterIP 172.20.253.138 <none> 8383/TCP,8686/TCP 14d
service/simple-prod-collector ClusterIP 172.20.255.184 <none> 9411/TCP,14250/TCP,14267/TCP,14268/TCP 8d
service/simple-prod-collector-headless ClusterIP None <none> 9411/TCP,14250/TCP,14267/TCP,14268/TCP 8d
service/simple-prod-query ClusterIP 172.20.254.102 <none> 16686/TCP 8d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/jaeger-operator 1/1 1 1 14d
deployment.apps/simple-prod-collector 1/1 1 1 8d
deployment.apps/simple-prod-query 1/1 1 1 8d
NAME DESIRED CURRENT READY AGE
replicaset.apps/jaeger-operator-6ff67bdd4b 1 1 1 14d
replicaset.apps/simple-prod-collector-59fc47bf5c 1 1 1 8d
replicaset.apps/simple-prod-query-85689b7bbd 1 1 1 8d
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
horizontalpodautoscaler.autoscaling/simple-prod-collector Deployment/simple-prod-collector 1457m/90, 137m/90 1 10 1 8d
如果流量大需要减小es压力,可以接入kafka集群,修改jaeger.yaml文件
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: simple-streaming
spec:
strategy: streaming
collector:
options:
kafka:
producer:
topic: jaeger-spans
brokers: my-cluster-kafka-brokers.kafka:9092 #修改为kafka地址
ingester:
options:
kafka:
consumer:
topic: jaeger-spans
brokers: my-cluster-kafka-brokers.kafka:9092 #修改为kafka地址
ingester:
deadlockInterval: 5s
storage:
type: elasticsearch
options:
es:
server-urls: http://elasticsearch:9200 #修改为ES地址
5.agent部署
jaeger client的一个代理程序,client将收集到的调用链数据发给agent,然后由agent发给collector。由于使用的udp协议,一般部署在靠近client的位置。
agent有多种安装方式
1).docker安装
下载:jaegertracing/jaeger-agent Tags (docker.com)
docker run -d -p 6831:6831/udp -p 6832:6832/udp -p 5778:5778/tcp jaegertracing/jaeger-agent:1.12 --reporter.grpc.host-port=xx.xx.xx.xx:14250
2).k8s安装又分两种
sidecar方式
daemonset方式
参考:Operator for Kubernetes — Jaeger documentation (jaegertracing.io)
3).二进制安装
下载:Jaeger – Download Jaeger (jaegertracing.io)
nohup ./jaeger-agent --collector.host-port=xxxx:14267 1>1.log 2>2.log &