【containerd】RunPodSandbox for XX

2022-10-20  本文已影响0人  Lis_

问题背景

工业云部署使用k8s集群

问题描述

工业云同学通过控制台升级业务容器镜像版本,升级后导致容器创建失败,失败的error log如下所示:

Jun 06 20:45:29 TENCENT64.site containerd[5327]: time="2022-06-06T20:45:29.816213573+08:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:edge-56c86c995f-tqzz2,Uid:da70d1c2-0978-40c1-a9a6-c5a25a7f41a3,Namespace:edge-adapter,Attempt:0,} failed, error" error="failed to reserve sandbox name \"edge-56c86c995f-tqzz2_edge-adapter_da70d1c2-0978-40c1-a9a6-c5a25a7f41a3_0\": name \"edge-56c86c995f-tqzz2_edge-adapter_da70d1c2-0978-40c1-a9a6-c5a25a7f41a3_0\" is reserved for \"762b66093089b50109f74fa5a4cc6e7165d916b18dfd1b5c877fc4effff1e558\""

问题排查

Jun 06 20:45:29 TENCENT64.site containerd[5327]: time="2022-06-06T20:45:29.816213573+08:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:edge-56c86c995f-tqzz2,Uid:da70d1c2-0978-40c1-a9a6-c5a25a7f41a3,Namespace:edge-adapter,Attempt:0,} failed, error" error="failed to reserve sandbox name \"edge-56c86c995f-tqzz2_edge-adapter_da70d1c2-0978-40c1-a9a6-c5a25a7f41a3_0\": name \"edge-56c86c995f-tqzz2_edge-adapter_da70d1c2-0978-40c1-a9a6-c5a25a7f41a3_0\" is reserved for \"762b66093089b50109f74fa5a4cc6e7165d916b18dfd1b5c877fc4effff1e558\""
ps -ef | grep chao
root        752 101375  0 16:05 pts/1    00:00:00 grep --color=auto chao
root      60780  59079  0 06:30 ?        00:00:03 /usr/local/bin/chaos-daemon --runtime containerd --http-port 31766 --grpc-port 31767 --pprof --ca /etc/chaos-daemon/cert/ca.crt --cert /etc/chaos-daemon/cert/tls.crt --key /etc/chaos-daemon/cert/tls.key --runtime-socket-path /host-run/containerd.sock

问题处理

后续问题跟进

stress-ng --io 30 -d 5
image.png

创建nginx deployment:

kubectl apply -f nginx.yaml
 kubectl get pods
NAME                               READY   STATUS              RESTARTS   AGE
nginx-deployment-857cbc9c6-7plrb   0/1     ContainerCreating   0          7m41s
nginx-deployment-857cbc9c6-f5gjq   0/1     ContainerCreating   0          7m40s
image.png

kubelet error log:


image.png

containerd error log:

Jun 07 21:03:05 VM-71-117-ubuntu containerd[14269]: time="2022-06-07T21:03:05.036947355+08:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:nginx-deployment-857cbc9c6-f5gjq,Uid:29d9b646-2ed6-411d-b89a-5e1526de3393,Namespace:default,Attempt:0,} failed, error" error="failed to reserve sandbox name \"nginx-deployment-857cbc9c6-f5gjq_default_29d9b646-2ed6-411d-b89a-5e1526de3393_0\": name \"nginx-deployment-857cbc9c6-f5gjq_default_29d9b646-2ed6-411d-b89a-5e1526de3393_0\" is reserved for \"aa39cc65311ca413accf095a659c372b02e987d5b97687d3241c830f3b1091d5\""

把压力测试进程停掉,pod就创建成功了:

kubectl get pods
NAME                               READY   STATUS    RESTARTS   AGE
nginx-deployment-857cbc9c6-7plrb   1/1     Running   0          14m
nginx-deployment-857cbc9c6-f5gjq   1/1     Running   0          14m
 Normal   Scheduled               7m41s  default-scheduler  Successfully assigned default/nginx-deployment-857cbc9c6-sp5rp to 10.0.71.117
  Warning  FailedCreatePodSandBox  3m37s  kubelet            Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
root@VM-71-117-ubuntu:~# kubectl get pods
NAME                               READY   STATUS              RESTARTS   AGE
nginx-deployment-857cbc9c6-sp5rp   0/1     ContainerCreating   0          7m23s
nginx-deployment-857cbc9c6-wm88t   0/1     RunContainerError   0          7m24s

kubelet log:

Jun 08 10:00:13 VM-71-117-ubuntu kubelet[11643]: E0608 10:00:13.322478   11643 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"nginx-deployment-857cbc9c6-sp5rp_default(c2c8a3d4-0e02-4688-8b6e-711a0f6525c5)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"nginx-deployment-857cbc9c6-sp5rp_default(c2c8a3d4-0e02-4688-8b6e-711a0f6525c5)\\\": rpc error: code = DeadlineExceeded desc = context deadline exceeded\"" pod="default/nginx-deployment-857cbc9c6-sp5rp" podUID=c2c8a3d4-0e02-4688-8b6e-711a0f6525c5

containerd log:


image.png
root      3516  3508  0 15:54 pts/1    00:00:00 runc init
image.png image.png

通过ps aux | grep runc会发现有些runc stack是卡在do_mount上,并且runc的进程状态是D,什么是D状态呢?D状态的进程通常是在等待IO,比如磁盘IO,网络IO,其他外设IO,很明显这块是在等待磁盘IO。

上一篇 下一篇

猜你喜欢

热点阅读