【k8s】k8s master 节点上pod被驱逐原因分析

2024-09-08 本文已影响0人放纵不基

解读一下 pod kubewatch-jlfsb 无法在master01节点无法成功运行的原因

$ kubectl get pods -A |grep  0/1 
kube-system               kubewatch-jlfsb                                     0/1     Evicted             0          3m10s


$ free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        5.0G        2.8G        820M        7.7G        8.8G
Swap:            0B          0B          0B



$ kubectl describe pod kubewatch-jlfsb -n kube-system
Name:           kubewatch-jlfsb
Namespace:      kube-system
Priority:       0
Node:           xt-master01/
Start Time:     Mon, 09 Sep 2024 10:20:39 +0800
Labels:         app=kubewatch
                controller-revision-hash=648d99c657
                pod-template-generation=1
Annotations:    <none>
Status:         Failed
Reason:         Evicted
Message:        The node was low on resource: memory. Container kubewatch was using 374076Ki, which exceeds its request of 0.
IP:
IPs:            <none>
Controlled By:  DaemonSet/kubewatch
Containers:
  kubewatch:
    Image:      deploy.bocloud.k8s:40443/paas/boc-kubewatch:boc3.3
    Port:       <none>
    Host Port:  <none>
    Liveness:   http-get http://127.0.0.1:9008/boc-kubewatch/apiHealthCheck delay=120s timeout=2s period=30s #success=1 #failure=5
    Environment:
      cluster_type:           https
      cluster_realmasterip:   10.12.32.108
      cluster_masterip:       127.0.0.1
      cluster_port:           6443
      cluster_token:          xxxxxxxxxx
      cluster_platformtype:   kubernetes
      cluster_envid:          35
      cluster_envname:        test-1-pro-default
      cluster_clusterid:      6
      cluster_clusterName:    k8s集群
      cluster_partitionid:    180
      cluster_partitionname:  test-1-pro-default
      cluster_networkname:
      cluster_boc:            http://10.12.32.6:30000/
    Mounts:
      /logs/ from boc-kubewatch-log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-wm2pc (ro)
Volumes:
  boc-kubewatch-log:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/deploy/docker/logs/boc-kubewatch
    HostPathType:
  default-token-wm2pc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-wm2pc
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/network-unavailable:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/pid-pressure:NoSchedule
                 node.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/unschedulable:NoSchedule
Events:
  Type     Reason     Age        From                  Message
  ----     ------     ----       ----                  -------
  Normal   Scheduled  <unknown>  default-scheduler     Successfully assigned kube-system/kubewatch-jlfsb to xt-master01
  Normal   Pulled     3m24s      kubelet, test-master01  Container image "deploy.bocloud.k8s:40443/paas/boc-kubewatch:boc3.3" already present on machine
  Normal   Created    3m24s      kubelet, test-master01  Created container kubewatch
  Normal   Started    3m24s      kubelet, test-master01  Started container kubewatch
  Warning  Evicted    2m30s      kubelet, test-master01  The node was low on resource: memory. Container kubewatch was using 374076Ki, which exceeds its request of 0.
  Normal   Killing    2m30s      kubelet, test-master01  Stopping container kubewatch

从你提供的信息来看，这个 Pod (kubewatch-jlfsb) 在 Kubernetes 集群中无法正常运行，原因是节点的资源不足。以下是详细的分析：

1. Pod 状态

Pod 状态显示为 Evicted，这意味着 Pod 被驱逐了。驱逐通常发生在节点资源紧张时，Kubernetes 会驱逐一些 Pod 以释放资源。

2. 原因

从 kubectl describe pod 的输出中，我们可以看到具体的驱逐原因是：

Reason:         Evicted
Message:        The node was low on resource: memory. Container kubewatch was using 374076Ki, which exceeds its request of 0.

这表示节点内存不足。Pod 的容器使用了 374076 KiB 的内存，而容器的内存请求是 0。虽然容器的请求为 0，Kubernetes 仍然会依据实际使用的内存量来决定是否驱逐 Pod。

3. 节点内存状况

你提供的 free -h 输出显示：

Mem:            15G        5.0G        2.8G        820M        7.7G        8.8G
Swap:            0B          0B          0B

总内存：15GB
已用内存：5.0GB
空闲内存：2.8GB
缓存/缓冲：7.7GB

虽然空闲内存有 2.8GB，看起来还不算特别低，但 Kubernetes 的调度和驱逐机制可能会认为已经不足以支持新的或现有的 Pod。

4. QoS 类别

该 Pod 的 QoS 类别是 BestEffort，这意味着它没有设置任何资源请求和限制。BestEffort 类别的 Pod 是在资源紧张时最容易被驱逐的，因为它们没有明确的资源需求保证。

5. 处理建议

为了避免类似的情况，你可以考虑以下几种措施：

设置资源请求和限制：
在 Pod 的配置中设置 CPU 和内存的请求（requests）和限制（limits），这有助于 Kubernetes 更好地管理资源并减少 Pod 被驱逐的概率。例如：
```
resources:
  requests:
    memory: "500Mi"
    cpu: "250m"
  limits:
    memory: "1Gi"
    cpu: "500m"
```
监控和优化节点资源：
- 监控节点的资源使用情况，确定是否有其他占用大量资源的进程或 Pod。
- 优化 Pod 的资源使用，确保节点上的资源分配合理。
增加节点内存：
如果节点资源总是不足，可以考虑增加节点的内存。
检查节点压力：
节点上可能存在其他资源压力（如磁盘压力、网络不可用等），确保节点的总体健康状态良好。

通过这些措施，你可以减少 Pod 被驱逐的可能性，并提高系统的稳定性。