十五 Kubernetes容器日志收集
(一) Kubernetes日志收集
1. Kubernetes需要收集哪些日志?
2.收集日志常用的技术栈
ELK日志流程可以有多种方案(不同组件可自由组合,根据自身业务配置),常见有以下:
1、 Logstash(采集、处理)—> ElasticSearch (存储)—>Kibana (展示)
2、Logstash(采集)—> Logstash(聚合、处理)—> ElasticSearch (存储)—>Kibana (展示)
3、 Filebeat(采集、处理)—> ElasticSearch (存储)—>Kibana (展示)
4、Filebeat(采集)—> Logstash(聚合、处理)—> ElasticSearch (存储)—>Kibana (展示)
上面几种是基本的日志收集平台,如果要采集大日志,避免日志阻塞,可以采用如下方法:
Filebeat(采集)—> Kafka/Redis(消峰) —> Logstash(聚合、处理)—> ElasticSearch (存储)—>Kibana (展示)
3.ElasticSearch+Fluentd+Kibana架构解析
(二) 使用EFK收集控制台日志
1.部署Elasticsearch+Fluentd+Kibana
本次实验的环境如下,服务器可用资源2核4G以上:
# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready <none> 88d v1.23.0
k8s-master02 Ready <none> 88d v1.23.0
k8s-master03 Ready <none> 88d v1.23.0
k8s-node01 Ready <none> 88d v1.23.0
k8s-node02 Ready <none> 88d v1.23.0
下载需要的部署文件:
# git clone https://github.com/dotbalo/k8s.git
# cd k8s/efk-7.10.2/
创建EFK所用的命名空间:
[root@k8s-master01 efk-7.10.2]# kubectl create -f create-logging-namespace.yaml
namespace/logging created
创建Elasticsearch集群(企业内已有ELK平台可以不创建):
[root@k8s-master01 efk-7.10.2]# kubectl create -f es-service.yaml
service/elasticsearch-logging created
[root@k8s-master01 efk-7.10.2]# kubectl create -f es-statefulset.yaml
serviceaccount/elasticsearch-logging created
clusterrole.rbac.authorization.k8s.io/elasticsearch-logging created
clusterrolebinding.rbac.authorization.k8s.io/elasticsearch-logging created
statefulset.apps/elasticsearch-logging created
创建Kibana(企业内已有ELK平台可以不创建):
[root@k8s-master01 efk-7.10.2]# kubectl create -f kibana-deployment.yaml -f kibana-service.yaml
deployment.apps/kibana-logging created
service/kibana-logging created
由于在Kubernetes集群中,我们可能并不需要对所有的机器都采集日志,所以可以更改Fluentd的部署文件如下,添加一个NodeSelector,只部署至需要采集的主机即可:
# grep "nodeSelector" fluentd-es-ds.yaml -A 3
nodeSelector:
fluentd: "true"
...
之后给需要采集日志的节点打上一个标签,以k8s-node01为例:
# kubectl label node k8s-node01 fluentd=true
# kubectl get node -l fluentd=true --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8s-node01 Ready <none> 88d vl.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,**fluentd=true**,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node01,kubernetes.io/os=linux,node.kubernetes.io/node=
创建Fluentd:
[root@k8s-master01 efk-7.10.2]# kubectl create -f fluentd-es-ds.yaml -f fluentd-es-configmap.yaml
serviceaccount/fluentd-es created
clusterrole.rbac.authorization.k8s.io/fluentd-es created
clusterrolebinding.rbac.authorization.k8s.io/fluentd-es created
daemonset.apps/fluentd-es-v3.1.1 created
configmap/fluentd-es-config-v0.2.1 created
Fluentd的ConfigMap有个字段需要注意,在fluentd-es-configmap.yaml最后有一个
output.conf:
output.conf: |-
<match **>
...
host elasticsearch-logging
port 9200
...
2.Kibana使用
确认创建的Pod都已经成功启动:
# kubectl get po -n logging
NAME READY STATUS RESTARTS AGE
elasticsearch-logging-0 1/1 Running 0 15h
elasticsearch-logging-1 1/1 Running 0 15h
fluentd-es-v3.l.1-p4zbk 1/1 Running 17 15h
kibana-logging-75bd6cchf5-psda5 1/1 Running 7 15h
接下来查看Kibana暴露的端口号,访问Kibana:
# kubectl get svc -n logging
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
elasticsearch-logging ClusterIP None <none> 9200/TCP,9300/TCP 15h
kibana-logging NodePort 192.168.148.65 <none> 5601:**32664**/TCP 15h
使用任意一个部署了kube-proxy服务的节点的IP+32664端口即可访问Kibana:

点击Explore on my own,之后再点击Visualize:

之后点击Add your data --》》Create index pattern:


在Index pattern name输入索引名logstash*,然后点击Next Step:

再次选择timestamp,即可创建索引:

之后点击菜单栏àDiscover即可看到相关日志:

(三) 使用Filebeat收集自定义文件日志
1.创建Kafka和Logstash
首先需要部署Kafka和Logstash至Kubernetes集群,如果企业内已经有比较成熟的技术栈,可以无需部署,直接将Filebeat的输出指向外部Kafka集群即可:
# cd filebeat
# helm install zookeeper zookeeper/ -n logging
# kubectl get po -n logging -l app.kubernetes.io/name=zookeeper
NAME READY STATUS RESTARTS AGE
zookeeper-0 1/1 Running 0 51s
# helm install kafka kafka/ -n logging
# kubectl get po -n logging -l app.kubernetes.io/component=kafka
NAME READY STATUS RESTARTS AGE
kafka-0 1/1 Running 0 43s
待Pod都正常后,创建Logstash服务:
# kubectl create -f logstash-service.yaml -f logstash-cm.yaml -f logstash.yaml -n logging
service/logstash-service created
configmap/logstash-configmap created
deployment.apps/logstash-deployment created
需要注意logstash-cm.yaml文件中的一些配置:
l input:数据来源,本次示例配置的是Kakfa;
l input.kafka.bootstrap_servers:Kafka地址,由于是安装在集群内部的,可以直接使用Kafka集群的Service接口,如果是外部地址,按需配置即可;
l input.kafka.topics:Kafka的topic,需要和Filebeat输出的topic一致;
l input.kafka.type:定义一个type,可以用于logstash输出至不同的Elasticsearch集群;
l output:数据输出至哪里,本次示例输出至Elasticsearch集群,在里面配置了一个判断语句,当type为filebeat-sidecar时,将会输出至Elasticsearch集群,并且index为filebeat-xxx。
2.注入Filebeat Sidecar
接下来创建一个模拟程序:
# kubectl create -f app.yaml -nlogging
该程序会一直在/opt/date.log文件输出当前日期,配置如下:
command:
- sh
- -c
- while true; do date >> /opt/date.log; sleep 2; done
成功启动后,可以查看该文件的内容:
# kubectl get po -nlogging
NAME READY STATUS RESTARTS AGE
app-6dd64bdc55-8wzjc 1/1 Running 0 19m
# kubectl logs app-6dd64bdc55-8wzjc -nlogging
# kubectl exec app-6dd64bdc55-8wzjc -nlogging -- tail -1 /opt/date.log
Wed Jul 10 09:10:06 UTC 2021
此时如果去Kibana检索该日志,是无法查到该日志的,因为Fluentd无法采集内部的日志文件。接下来改造该程序,添加Filebeat至该部署文件(只展示了部分内容):
# cat app-filebeat.yaml
...
containers:
- name: filebeat
image: registry.cn-beijing.aliyuncs.com/dotbalo/filebeat:7.10.2
imagePullPolicy: IfNotPresent
env:
- name: podIp
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: podName
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: podNamespace
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: podDeployName
value: app
- name: TZ
value: "Asia/Shanghai"
securityContext:
runAsUser: 0
volumeMounts:
- name: logpath
mountPath: /data/log/app/
- name: filebeatconf
mountPath: /usr/share/filebeat/filebeat.yml
subPath: usr/share/filebeat/filebeat.yml
- name: app
image: registry.cn-beijing.aliyuncs.com/dotbalo/alpine:3.6
imagePullPolicy: IfNotPresent
volumeMounts:
- name: logpath
mountPath: /opt/
env:
- name: TZ
value: "Asia/Shanghai"
- name: LANG
value: C.UTF-8
- name: LC_ALL
value: C.UTF-8
command:
- sh
- -c
- while true; do date >> /opt/date.log; sleep 2; done
volumes:
- name: logpath
emptyDir: {}
- name: filebeatconf
configMap:
name: filebeatconf
items:
- key: filebeat.yml
path: usr/share/filebeat/filebeat.yml
可以看到在Deployment部署文件中,添加了Volumes配置,并配置了一个名为logpath的volume,将其挂载到了应用容器的/opt/目录和Filebeat的/data/log/app/目录,这样同一个Pod内的两个容器就实现了目录的共享。
之后创建一个Filebeat的配置文件,采集该目录下的日志即可:
# cat filebeat-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: filebeatconf
data:
filebeat.yml: |-
filebeat.inputs:
- input_type: log
paths:
- /data/log/*/*.log
tail_files: true
fields:
pod_name: '${podName}'
pod_ip: '${podIp}'
pod_deploy_name: '${podDeployName}'
pod_namespace: '${podNamespace}'
output.kafka:
hosts: ["kafka:9092"]
topic: "filebeat-sidecar"
codec.json:
pretty: false
keep_alive: 30s
需要注意paths是配置的共享目录,output.kafka需要和logstash的kafka为同一个集群,并且topic和logstash消费的topic为同一个。之后注入Filebeat:
# kubectl apply -f filebeat-cm.yaml -f app-filebeat.yaml -n logging
configmap/filebeatconf created
deployment.apps/app configured
之后在Kibana上添加Filebeat的索引即可查看日志,添加步骤和EFK一致,只需要更改索引名即可。首先点击菜单栏àStack Management:

然后点击添加索引:

在Index pattern name输入filebeat*,最后点击Next step即可:

添加完成后,在Kibana上选择filebeat的索引即可查看日志,在此不再演示。
3.清理环境
如果是学习环境,做完练习后可以清理数据:
# helm delete kafka zookeeper -n logging
# kubectl delete -f . -n logging
# cd ..
# ls
create-logging-namespace.yaml es-service.yaml es-statefulset.yaml filebeat fluentd-es-configmap.yaml fluentd-es-ds.yaml kafka kibana-deployment.yaml kibana-service.yaml
# kubectl delete -f . -n logging
(四) Loki初体验
1.安装Loki Stack
Loki提供了Helm的安装方式,可以直接下载包进行安装即可,首先添加并更新Loki的Helm仓库:
# helm repo add grafana https://grafana.github.io/helm-charts
"grafana" has been added to your repositories
# helm repo update
Hang tight while we grab the latest from your chart repositories...
...
...Successfully got an update from the "grafana" chart repository
创建Loki Namespace:
# kubectl create ns loki
namespace/loki created
创建Loki Stack:
# helm upgrade --install loki grafana/loki-stack --set grafana.enabled=true --set grafana.service.type=NodePort -n loki
NAME: loki
NAMESPACE: loki
STATUS: deployed
REVISION: 1
NOTES:
The Loki stack has been deployed to your cluster. Loki can now be added as a datasource in Grafana.
See http://docs.grafana.org/features/datasources/loki/ for more detail.
查看Pod状态:
# kubectl get po -n loki
NAME READY STATUS RESTARTS AGE
loki-0 1/1 Running 0 4m23s
loki-grafana-5b57955f9d-48z6l 1/1 Running 0 4m23s
loki-promtail-2n24m 1/1 Running 0 4m23s
loki-promtail-59slx 1/1 Running 0 4m23s
loki-promtail-6flzq 1/1 Running 0 4m23s
loki-promtail-gq2hk 1/1 Running 0 4m23s
loki-promtail-sqwtv 1/1 Running 0 4m23s
查看Grafana的Service暴露的端口号:
# kubectl get svc -n loki
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
loki ClusterIP 192.168.67.18 <none> 3100/TCP 7m54s
loki-grafana NodePort 192.168.103.121 <none> 80:**31053**/TCP 7m54s
loki-headless ClusterIP None <none> 3100/TCP 7m54s
之后通过任意一个安装了kube-proxy的节点的IP加上31053即可访问Grafana:

查看Grafana密码(账号admin):
# kubectl get secret --namespace loki loki-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
eD47DAbzhyPLgDeSM8C0LvBi3DksU73vZND8t4h0
登录后,按图示点击即可查看Loki语法指南:

其它安装配置可以参考:https://grafana.com/docs/loki/latest/installation/helm/。
2.Loki语法入门
Loki是参考Prometheus进行设计的,所以查询语法和PromQL类似,比如查询命名空间为kube-system下的所有Pod的日志,只需要在Log browser输入{namespace="kube-system"},然后点击Run query即可:

在图示的最下方为日志详情,如下所示:

可以看到每条日志都被添加了很多的标签,之后可以通过标签过滤日志,比如查看命名空间为kube-system,且Pod名称包含calico的日志(下面语法将不再提供截图,读者可以自行测试):
{namespace="kube-system", pod=~"calico.*"}
相当于=~,类似的语法还有:
l =~ :正则模糊匹配
l = :完全匹配
l != :不等于
l !~ :正则模糊不等于
Loki语法同时支持Pipeline,比如需要过滤日志包含avg字段的日志:
{namespace="kube-system", pod=~"calico.*"} |~ "avg"
查询到的日志如下:

还可以使用logfmt对日志进行格式化,然后可以进行值的判断,比如找到包含avg字段,并且longest的值大于16ms的日志:
{namespace="kube-system", pod=~"calico.*"} |~ "avg" | logfmt | longest > 16ms
得到的日志如下,和上个结果相比,只有longest大于16ms的日志,其它的均已被过滤掉:

值匹配可以判断多个条件,比如longest大于16ms且avg大于6ms的日志:
上述演示的语法为最基本也是最常用的语法,如果想要了解更多复杂的语法,可以查看LogQL的官方文档:https://grafana.com/docs/loki/latest/logql/。