K8s -- DaemonSet
一个DaemonSet对象能确保其创建的Pod在集群中的每一台(或指定)Node上都运行一个副本。如果集群中动态加入了新的Node,DaemonSet中的Pod也会被添加在新加入Node上运行。删除一个DaemonSet也会级联删除所有其创建的Pod。下面是一些典型的DaemonSet的使用场景:
- 在每台节点上运行一个集群存储服务,例如运行glusterd,ceph。
- 在每台节点上运行一个日志收集服务,例如fluentd,logstash。
- 在每台节点上运行一个节点监控服务,例如Prometheus Node Exporter, collectd, Datadog agent, New Relic agent, 或Ganglia gmond
1. 创建一个DaemonSet对象
下面的描述文件创建了一个运行着fluentd-elasticsearch镜像的DaemonSet对象:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
namespace: kube-system
labels:
k8s-app: fluentd-logging
spec:
selector:
matchLabels:
name: fluentd-elasticsearch
template:
metadata:
labels:
name: fluentd-elasticsearch
spec:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd-elasticsearch
image: k8s.gcr.io/fluentd-elasticsearch:1.20
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
在Kubernetes 1.8之后,必须指定
.spec.selector
来确定这个DaemonSet对象管理的Pod,通常与.spec.template.metadata.labels
中定义的Pod的label一致。
通过指定
.spec.template.spec.nodeSelector
或.spec.template.spec.affinity
,DaemonSet Controller会将Pod创建在特定的Node上。更过关于NodeSelector和NodeAffinity的知识,请参考:https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
Daemon Pods的调度特性
默认情况下,Pod被分配到具体哪一台Node上运行是由Scheduler(负责分配调度Pod到集群内的Node上,它通过监听ApiServer,查询还未分配Node的Pod,然后根据调度策略为这些Pod分配Node)决定的。但是,DaemonSet对象创建的Pod却拥有一些特殊的特性:
- Node的
unschedulable
属性会被DaemonSet Controller忽略。 - 即使Scheduler还未启动,DaemonSet Controller也能够创建并运行Pod。
Daemon Pods支持taints and tolerations, 但是这些Pods在创建时就默认容忍下列effect为NoExecute的taints(未设置tolerationSeconds):
Toleration Key | Effect | Version | Description |
---|---|---|---|
node.kubernetes.io/not-ready | NoExecute | 1.13+ | DaemonSet pods will not be evicted when there are node problems such as a network partition. |
node.kubernetes.io/unreachable | NoExecute | 1.13+ | DaemonSet pods will not be evicted when there are node problems such as a network partition. |
node.kubernetes.io/disk-pressure | NoSchedule | 1.8+ | ... |
node.kubernetes.io/memory-pressure | NoSchedule | 1.8+ | ... |
node.kubernetes.io/unschedulable | NoSchedule | 1.12+ | DaemonSet pods tolerate unschedulable attributes by default scheduler. |
node.kubernetes.io/network-unavailable | NoSchedule | 1.12+ | DaemonSet pods, who uses host network, tolerate network-unavailable attributes by default scheduler. |