Kdump

2021-08-19  本文已影响0人  偷油考拉

参考资料

红帽RHEL6 - The kdump Crash Recovery Service
红帽RHEL7 - Kernel crash dump guide
Ubuntu - Kernel Crash Dump
SUSE - Kexec and Kdump
NXP - kdump/kexec User Manual
内核文档 - Kernel document - kdump
Wiki文档 - kdump (Linux) - Wikipedia

基于CentOS7,DUMP内存

  1. 安装kdump
yum install kexec-tools
  1. 设置kernel启动配置
    Setting kernel command-line parameters

编辑/etc/default/grub,增加crashkernel=auto

GRUB_CMDLINE_LINUX="rd.lvm.lv=rhel/swap crashkernel=auto rd.lvm.lv=rhel/root rhgb quiet"
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet"

生成新的grub.cfg文件

[root@prod-proxy grub2]# grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-693.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-693.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-f45cdfd78e8d45b4b2ae3e0154762931
Found initrd image: /boot/initramfs-0-rescue-f45cdfd78e8d45b4b2ae3e0154762931.img
done


[root@prod-proxy grub2]# diff grub.cfg grub.cfg.bak   --suppress-common-lines
100c100
<       linux16 /vmlinuz-3.10.0-693.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet
---
>       linux16 /vmlinuz-3.10.0-693.el7.x86_64 root=/dev/mapper/centos-root ro rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=zh_CN.UTF-8
114c114
<       linux16 /vmlinuz-0-rescue-f45cdfd78e8d45b4b2ae3e0154762931 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet
---
>       linux16 /vmlinuz-0-rescue-f45cdfd78e8d45b4b2ae3e0154762931 root=/dev/mapper/centos-root ro rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet
117c117
< if [ "x$default" = 'CentOS Linux (3.10.0-693.el7.x86_64) 7 (Core)' ]; then default='Advanced options for CentOS Linux>CentOS Linux (3.10.0-693.el7.x86_64) 7 (Core)'; fi;
---
>

重启,加载新Kernel
检查kernel是否配置crashkernel

[sysadmin@prod-proxy ~]$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-693.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet

启动服务

systemctl enable kdump
systemctl start kdump

检查是否加载

[sysadmin@prod-proxy ~]$ cat /proc/iomem |grep Crash
  2b000000-350fffff : Crash kernel
  1. Dump内存

Warning
测试Crash Dump机制会导致系统重启。如果系统的负载高,可能会导致丢失数据。如果确定要测试,请确保系统空闲或负载低。

echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger

正常情况下,在kernel crash的时候会激活该机制。

使用Crash工具分析内存

  1. 安装工具
    安装Crash

    yum install crash
    

    除了Crash,还需要安装kernel-debuginfo。在root下,使用debuginfo-install安装。

    #安装 debuginfo-install
    yum install yum-utils -y
    #安装kernel-debuginfo。安装了kernel-debuginfo,yum-plugin-auto-update-debug-info,kernel-debuginfo-common-x86_64三个包
    debuginfo-install kernel
    

    安装完毕后,才会出现/usr/lib/debug/lib/modules/目录,后面会用到

  2. 确认dump文件

    [root@localhost 127.0.0.1-2021-08-18-05:30:37]# ls
    vmcore  vmcore-dmesg.txt
    
  3. 确认内核

    [root@localhost 127.0.0.1-2021-08-18-05:30:37]# uname -r
    3.10.0-862.el7.x86_64
    [root@localhost 127.0.0.1-2021-08-18-05:30:37]# ls /usr/lib/debug/lib/modules/
    3.10.0-862.el7.x86_64
    
  4. 运行crash工具

    
    

附件1

CentOS7安装的时候,默认开启KDUMP。如图:


图片.png 图片.png

附件2

开启KDUMP vs 未开启KDUMP

  1. 查看/proc/cmdline,有没有增加crashkernel字段
#Disable
[root@localhost ~]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-862.el7.x86_64 root=/dev/mapper/centos-root ro rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8

#Enable
[root@localhost ~]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-862.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8

  1. 查看/proc/iomem,是否成功load了crash kernel
#Disable
[root@localhost ~]# cat /proc/iomem |grep Crash

#Enable
[root@localhost ~]# cat /proc/iomem |grep Crash
  2b000000-350fffff : Crash kernel
  1. 安装kdump组件包,包括crash kernel和kexec组件
yum install kexec-tools crash -y

/usr/lib/systemd/system/kdump.servicekexec-tools 安装

安装图形化界面

yum install system-config-kdump
  1. 启动kdump服务,通过service命令或者/etc目录中的启动脚本启动
#Disable
[root@localhost ~]# systemctl list-unit-files |grep kdump
kdump.service                                 disabled
[root@localhost ~]# 
[root@localhost ~]# systemctl is-active kdump
unknown

#Enable
[root@localhost ~]# systemctl list-unit-files |grep kdump
kdump.service                                 enabled 
[root@localhost ~]# 
[root@localhost ~]# systemctl is-active kdump
active

附件3

测试kdump配置
开启kdump,重启系统,确认服务在运行状态。然后,在交互窗输入如下命令:
echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger

[root@localhost ~]# free 
              total        used        free      shared  buff/cache   available
Mem:        1883092      125272     1372480        9012      385340     1574260
Swap:       1679356           0     1679356
[root@localhost ~]# 
[root@localhost ~]# echo 1 > /proc/sys/kernel/sysrq
[root@localhost ~]# echo c > /proc/sysrq-trigger
Socket error Event: 32 Error: 10053.
Connection closing...Socket close.

Connection closed by foreign host.

Disconnected from remote host(t2) at 17:30:53.

Type `help' to learn how to use Xshell prompt.

这将强制内核crash,创建address-YYYY-MM-DD-HH:MM:SS/vmcore文件。默认/var/crash/目录。

NOTE
除了可以校验配置的有效性外,该操作还可以用于记录在具有代表性测试负载下完成crash dump的时间。

稍后,重新连接服务器,访问/var/crash目录,查看dump文件

[root@localhost ~]# date
Wed Aug 18 05:32:15 EDT 2021

[root@localhost ~]# ls /var/crash
127.0.0.1-2021-08-18-05:30:37

[root@localhost 127.0.0.1-2021-08-18-05:30:37]# ll -h
total 39M
-rw-------. 1 root root  39M Aug 18 05:30 vmcore
-rw-r--r--. 1 root root 107K Aug 18 05:30 vmcore-dmesg.txt

上一篇下一篇

猜你喜欢

热点阅读