Kdump
参考资料
红帽RHEL6 - The kdump Crash Recovery Service
红帽RHEL7 - Kernel crash dump guide
Ubuntu - Kernel Crash Dump
SUSE - Kexec and Kdump
NXP - kdump/kexec User Manual
内核文档 - Kernel document - kdump
Wiki文档 - kdump (Linux) - Wikipedia
基于CentOS7,DUMP内存
- 安装kdump
yum install kexec-tools
- 设置kernel启动配置
Setting kernel command-line parameters
编辑/etc/default/grub
,增加crashkernel=auto
GRUB_CMDLINE_LINUX="rd.lvm.lv=rhel/swap crashkernel=auto rd.lvm.lv=rhel/root rhgb quiet"
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet"
生成新的grub.cfg文件
[root@prod-proxy grub2]# grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-693.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-693.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-f45cdfd78e8d45b4b2ae3e0154762931
Found initrd image: /boot/initramfs-0-rescue-f45cdfd78e8d45b4b2ae3e0154762931.img
done
[root@prod-proxy grub2]# diff grub.cfg grub.cfg.bak --suppress-common-lines
100c100
< linux16 /vmlinuz-3.10.0-693.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet
---
> linux16 /vmlinuz-3.10.0-693.el7.x86_64 root=/dev/mapper/centos-root ro rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=zh_CN.UTF-8
114c114
< linux16 /vmlinuz-0-rescue-f45cdfd78e8d45b4b2ae3e0154762931 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet
---
> linux16 /vmlinuz-0-rescue-f45cdfd78e8d45b4b2ae3e0154762931 root=/dev/mapper/centos-root ro rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet
117c117
< if [ "x$default" = 'CentOS Linux (3.10.0-693.el7.x86_64) 7 (Core)' ]; then default='Advanced options for CentOS Linux>CentOS Linux (3.10.0-693.el7.x86_64) 7 (Core)'; fi;
---
>
重启,加载新Kernel
检查kernel是否配置crashkernel
[sysadmin@prod-proxy ~]$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-693.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet
启动服务
systemctl enable kdump
systemctl start kdump
检查是否加载
[sysadmin@prod-proxy ~]$ cat /proc/iomem |grep Crash
2b000000-350fffff : Crash kernel
- Dump内存
Warning
测试Crash Dump机制会导致系统重启。如果系统的负载高,可能会导致丢失数据。如果确定要测试,请确保系统空闲或负载低。
echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger
正常情况下,在kernel crash的时候会激活该机制。
使用Crash工具分析内存
-
安装工具
安装Crashyum install crash
除了Crash,还需要安装kernel-debuginfo。在root下,使用debuginfo-install安装。
#安装 debuginfo-install yum install yum-utils -y #安装kernel-debuginfo。安装了kernel-debuginfo,yum-plugin-auto-update-debug-info,kernel-debuginfo-common-x86_64三个包 debuginfo-install kernel
安装完毕后,才会出现
/usr/lib/debug/lib/modules/
目录,后面会用到 -
确认dump文件
[root@localhost 127.0.0.1-2021-08-18-05:30:37]# ls vmcore vmcore-dmesg.txt
-
确认内核
[root@localhost 127.0.0.1-2021-08-18-05:30:37]# uname -r 3.10.0-862.el7.x86_64 [root@localhost 127.0.0.1-2021-08-18-05:30:37]# ls /usr/lib/debug/lib/modules/ 3.10.0-862.el7.x86_64
-
运行crash工具
附件1
CentOS7安装的时候,默认开启KDUMP。如图:
图片.png 图片.png
附件2
开启KDUMP vs 未开启KDUMP
- 查看
/proc/cmdline
,有没有增加crashkernel字段
#Disable
[root@localhost ~]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-862.el7.x86_64 root=/dev/mapper/centos-root ro rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8
#Enable
[root@localhost ~]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-862.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8
- 查看
/proc/iomem
,是否成功load了crash kernel
#Disable
[root@localhost ~]# cat /proc/iomem |grep Crash
#Enable
[root@localhost ~]# cat /proc/iomem |grep Crash
2b000000-350fffff : Crash kernel
- 安装kdump组件包,包括crash kernel和kexec组件
yum install kexec-tools crash -y
/usr/lib/systemd/system/kdump.service
由 kexec-tools
安装
安装图形化界面
yum install system-config-kdump
- 启动kdump服务,通过service命令或者/etc目录中的启动脚本启动
#Disable
[root@localhost ~]# systemctl list-unit-files |grep kdump
kdump.service disabled
[root@localhost ~]#
[root@localhost ~]# systemctl is-active kdump
unknown
#Enable
[root@localhost ~]# systemctl list-unit-files |grep kdump
kdump.service enabled
[root@localhost ~]#
[root@localhost ~]# systemctl is-active kdump
active
附件3
测试kdump配置
开启kdump,重启系统,确认服务在运行状态。然后,在交互窗输入如下命令:
echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger
[root@localhost ~]# free
total used free shared buff/cache available
Mem: 1883092 125272 1372480 9012 385340 1574260
Swap: 1679356 0 1679356
[root@localhost ~]#
[root@localhost ~]# echo 1 > /proc/sys/kernel/sysrq
[root@localhost ~]# echo c > /proc/sysrq-trigger
Socket error Event: 32 Error: 10053.
Connection closing...Socket close.
Connection closed by foreign host.
Disconnected from remote host(t2) at 17:30:53.
Type `help' to learn how to use Xshell prompt.
这将强制内核crash,创建address-YYYY-MM-DD-HH:MM:SS/vmcore文件。默认/var/crash/目录。
NOTE
除了可以校验配置的有效性外,该操作还可以用于记录在具有代表性测试负载下完成crash dump的时间。
稍后,重新连接服务器,访问/var/crash
目录,查看dump文件
[root@localhost ~]# date
Wed Aug 18 05:32:15 EDT 2021
[root@localhost ~]# ls /var/crash
127.0.0.1-2021-08-18-05:30:37
[root@localhost 127.0.0.1-2021-08-18-05:30:37]# ll -h
total 39M
-rw-------. 1 root root 39M Aug 18 05:30 vmcore
-rw-r--r--. 1 root root 107K Aug 18 05:30 vmcore-dmesg.txt