kubernetes

【k8s】k8s master 节点上pod被驱逐原因分析

2024-09-08  本文已影响0人  Bogon

解读一下 pod kubewatch-jlfsb 无法在master01节点无法成功运行的原因

$ kubectl get pods -A |grep  0/1 
kube-system               kubewatch-jlfsb                                     0/1     Evicted             0          3m10s


$ free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        5.0G        2.8G        820M        7.7G        8.8G
Swap:            0B          0B          0B



$ kubectl describe pod kubewatch-jlfsb -n kube-system
Name:           kubewatch-jlfsb
Namespace:      kube-system
Priority:       0
Node:           xt-master01/
Start Time:     Mon, 09 Sep 2024 10:20:39 +0800
Labels:         app=kubewatch
                controller-revision-hash=648d99c657
                pod-template-generation=1
Annotations:    <none>
Status:         Failed
Reason:         Evicted
Message:        The node was low on resource: memory. Container kubewatch was using 374076Ki, which exceeds its request of 0.
IP:
IPs:            <none>
Controlled By:  DaemonSet/kubewatch
Containers:
  kubewatch:
    Image:      deploy.bocloud.k8s:40443/paas/boc-kubewatch:boc3.3
    Port:       <none>
    Host Port:  <none>
    Liveness:   http-get http://127.0.0.1:9008/boc-kubewatch/apiHealthCheck delay=120s timeout=2s period=30s #success=1 #failure=5
    Environment:
      cluster_type:           https
      cluster_realmasterip:   10.12.32.108
      cluster_masterip:       127.0.0.1
      cluster_port:           6443
      cluster_token:          xxxxxxxxxx
      cluster_platformtype:   kubernetes
      cluster_envid:          35
      cluster_envname:        test-1-pro-default
      cluster_clusterid:      6
      cluster_clusterName:    k8s集群
      cluster_partitionid:    180
      cluster_partitionname:  test-1-pro-default
      cluster_networkname:
      cluster_boc:            http://10.12.32.6:30000/
    Mounts:
      /logs/ from boc-kubewatch-log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-wm2pc (ro)
Volumes:
  boc-kubewatch-log:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/deploy/docker/logs/boc-kubewatch
    HostPathType:
  default-token-wm2pc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-wm2pc
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/network-unavailable:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/pid-pressure:NoSchedule
                 node.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/unschedulable:NoSchedule
Events:
  Type     Reason     Age        From                  Message
  ----     ------     ----       ----                  -------
  Normal   Scheduled  <unknown>  default-scheduler     Successfully assigned kube-system/kubewatch-jlfsb to xt-master01
  Normal   Pulled     3m24s      kubelet, test-master01  Container image "deploy.bocloud.k8s:40443/paas/boc-kubewatch:boc3.3" already present on machine
  Normal   Created    3m24s      kubelet, test-master01  Created container kubewatch
  Normal   Started    3m24s      kubelet, test-master01  Started container kubewatch
  Warning  Evicted    2m30s      kubelet, test-master01  The node was low on resource: memory. Container kubewatch was using 374076Ki, which exceeds its request of 0.
  Normal   Killing    2m30s      kubelet, test-master01  Stopping container kubewatch

从你提供的信息来看,这个 Pod (kubewatch-jlfsb) 在 Kubernetes 集群中无法正常运行,原因是节点的资源不足。以下是详细的分析:

1. Pod 状态

Pod 状态显示为 Evicted,这意味着 Pod 被驱逐了。驱逐通常发生在节点资源紧张时,Kubernetes 会驱逐一些 Pod 以释放资源。

2. 原因

kubectl describe pod 的输出中,我们可以看到具体的驱逐原因是:

Reason:         Evicted
Message:        The node was low on resource: memory. Container kubewatch was using 374076Ki, which exceeds its request of 0.

这表示节点内存不足。Pod 的容器使用了 374076 KiB 的内存,而容器的内存请求是 0。虽然容器的请求为 0,Kubernetes 仍然会依据实际使用的内存量来决定是否驱逐 Pod。

3. 节点内存状况

你提供的 free -h 输出显示:

Mem:            15G        5.0G        2.8G        820M        7.7G        8.8G
Swap:            0B          0B          0B

虽然空闲内存有 2.8GB,看起来还不算特别低,但 Kubernetes 的调度和驱逐机制可能会认为已经不足以支持新的或现有的 Pod。

4. QoS 类别

该 Pod 的 QoS 类别是 BestEffort,这意味着它没有设置任何资源请求和限制。BestEffort 类别的 Pod 是在资源紧张时最容易被驱逐的,因为它们没有明确的资源需求保证。

5. 处理建议

为了避免类似的情况,你可以考虑以下几种措施:

  1. 设置资源请求和限制
    在 Pod 的配置中设置 CPU 和内存的请求(requests)和限制(limits),这有助于 Kubernetes 更好地管理资源并减少 Pod 被驱逐的概率。例如:

    resources:
      requests:
        memory: "500Mi"
        cpu: "250m"
      limits:
        memory: "1Gi"
        cpu: "500m"
    
  2. 监控和优化节点资源

    • 监控节点的资源使用情况,确定是否有其他占用大量资源的进程或 Pod。
    • 优化 Pod 的资源使用,确保节点上的资源分配合理。
  3. 增加节点内存
    如果节点资源总是不足,可以考虑增加节点的内存。

  4. 检查节点压力
    节点上可能存在其他资源压力(如磁盘压力、网络不可用等),确保节点的总体健康状态良好。

通过这些措施,你可以减少 Pod 被驱逐的可能性,并提高系统的稳定性。

上一篇下一篇

猜你喜欢

热点阅读