Kubelet 组件占用 CPU 过高的解决思路

2020-11-27  本文已影响0人  陈光辉_akr8s

问题发现

环境信息

Kubernetes: v1.18.8
操作系统:CentOS Linux release 7.6.1810 (Core) 
kernel:4.4.227-1.el7.elrepo.x86_64

排查过程

  1. 使用工具

    • sysstat version 11.5.5
    • perf
    • go tool
    • FlameGraph
    • htop
  2. 分析过程

# pidof kubelet
391280
# pidstat -p 391280 1 5
Linux 4.4.227-1.el7.elrepo.x86_64 (rancher-dg-tn9)      11/20/2020      _x86_64_        (72 CPU)

07:54:45 PM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
07:54:46 PM     0    391280    4.95  100.00    0.00    0.00  100.00    34  kubelet
07:54:47 PM     0    391280    0.00  100.00    0.00    0.00  100.00    34  kubelet
07:54:48 PM     0    391280    0.00  100.00    0.00    0.00  100.00    34  kubelet
07:54:49 PM     0    391280    3.00  100.00    0.00    0.00  100.00    34  kubelet
07:54:50 PM     0    391280    3.00  100.00    0.00    0.00  100.00    34  kubelet
Average:        0    391280    4.30  100.00    0.00    0.00  100.00     -  kubelet
# pidof kubelet
391280
# strace -cp 391280
strace: Process 391280 attached
strace: Process 391280 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.80 2623.789827       41644     63005     11889 futex
  0.10    2.721893      544378         5           epoll_wait
  0.08    1.989361        1572      1265           epoll_pwait
  0.01    0.156758          19      8173         2 newfstatat
  0.00    0.109730          32      3354       339 read
  0.00    0.101180          21      4602         1 readlinkat
  0.00    0.059750          20      2972       526 epoll_ctl
  0.00    0.039303          16      2422           fcntl
  0.00    0.036715          24      1477           openat
  0.00    0.030070          44       674           write
  0.00    0.027179         204       133           nanosleep
  0.00    0.025634          17      1484           close
  0.00    0.016708          36       457           getdents64
  0.00    0.015895          13      1169           fstat
  0.00    0.015259         206        74           sched_yield
  0.00    0.002846         218        13         7 connect
  0.00    0.001864          38        49           setsockopt
  0.00    0.000938          72        13           socket
  0.00    0.000820          48        17           madvise
  0.00    0.000518          34        15           getpeername
  0.00    0.000470          52         9           getsockopt
  0.00    0.000450          30        15           getsockname
  0.00    0.000427          35        12           rt_sigreturn
  0.00    0.000319          22        14           lseek
  0.00    0.000049          24         2         2 unlinkat
  0.00    0.000007           7         1           getrandom
------ ----------- ----------- --------- --------- ----------------
100.00 2629.143970                 91426     12766 total
# strace -p 391280 -T -v -e trace=all -ff 2>&1 | egrep '<[1-9]\.[1-9]'

10:39:35.050828 futex(0x7027320, FUTEX_WAIT_PRIVATE, 0, {tv_sec=3, tv_nsec=984755078}) = -1 ETIMEDOUT (Connection timed out) <3.984847>
10:39:39.035764 futex(0xc0021d4148, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000012>
10:39:39.035822 futex(0xc002ab12c8, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000011>
10:39:39.035861 futex(0x7027320, FUTEX_WAIT_PRIVATE, 0, {tv_sec=8, tv_nsec=935276896}) = -1 ETIMEDOUT (Connection timed out) <8.935371>
10:39:47.971297 futex(0x7022130, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000009>
10:39:47.971337 futex(0x7022030, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000008>
10:39:47.971364 futex(0xc00084c148, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000008>
10:39:47.971388 futex(0xc0002024c8, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000008>
10:39:47.971411 futex(0x7027320, FUTEX_WAIT_PRIVATE, 0, {tv_sec=2, tv_nsec=65115694}) = -1 ETIMEDOUT (Connection timed out) <2.065206>
# perf record -F 99 -p 391280 -g -- sleep 30
# perf script -i perf.data &> perf.unfold
# git clone https://github.com/brendangregg/FlameGraph.git
# cp perf.unfold FlameGraph/
# cd FlameGraph/
# ./stackcollapse-perf.pl perf.unfold &> perf.folded
# ./flamegraph.pl perf.folded > perf_kubelet.svg
perf_kubelet.png
# kubectl proxy --address='0.0.0.0'  --accept-hosts='^*$'
# docker run -d --name golang-env --net host golang:latest sleep 3600
# go tool pprof -seconds=60 -raw -output=kubelet.pprof http://APIserver:8001/api/v1/nodes/node_name/proxy/debug/pprof/profile
# ./stackcollapse-go.pl go_kubelet.pprof > go_kubelet.out
# ./flamegraph.pl go_kubelet.out > go_kubelet.svg
go_kubelet.png
# time cat /sys/fs/cgroup/memory/memory.stat > /dev/null
real    0m9.115s
user    0m0.000s
sys     0m9.112s

处理方法

# 目前处理
echo 2 > /proc/sys/vm/drop_caches

# 后续继续观察是否需要升级 kernel

安信证券容器云团队

上一篇 下一篇

猜你喜欢

热点阅读