Tuesday 30 May 2017

Using kdump to analyze / capture kernel crashes with CentOS / Fedora / RHEL 7

kdump is a utility to help you capture a memory dump of the system when the kernel crashes.

kdump reserves a small portion of the memory for a 'crash kernel' that is invoked (as the name implies) when a system crash occurs - it's sole purpose is to get a memory dump of the system at that point in time.

kdump comes as part of the kexec-tools package - so in order to install we will issue:

sudo dnf install --enablerepo=fedora-debuginfo --enablerepo=updates-debuginfo kexec-tools crash kernel-debuginfo

if you prefer to do the analysis on another server we can simply issue:

sudo dnf install kexec-tools

Note: The kernel-debuginfo package provides you with the necessary tools to debug the kernel vmcore.

We will need to enable the crash kernel in our grub config - as a one of we can find the linux line on the relevant boot entry in /boot/grub2/grub.cfg and add:

crashkernel=128M

or we can make it persistent and add it to:

/etc/default/grub

and prepending it to the 'GRUB_CMDLINE_LINUX' variable e.g.:

GRUB_CMDLINE_LINUX="crashkernel=128M rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rhgb quiet"

Warning: 'crashkernel=auto' - the auto value is does not work with Fedora!

and then regenerate the grub configuration with:

sudo grub2-mkconfig -o /boot/grub2/grub.cfg

We can also define where the dumps are stored - by default this is on the local filesystem under /var/crash - however we can (and should ideally) house this on external sources like a file share - to do this we can edit the kdump config file:

sudo vi /etc/kdump.conf

and uncomment the relevant options.

We can also enable the 'core_collector' feature that will compress our dumps for us - however on Fedora 25 this is already enabled for us - if not uncomment the following line:

core_collector makedumpfile -l --message-level 1 -d 31

Now restart your system and then enable / start the kdump service with:

sudo systemctl enable kdump && sudo systemctl start kdump 

We can then trigger a kernel panic using SysRq - using the proc filesystem:

echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger

We can then inspect the dump with:

crash /var/crash/127.0.0.1-2017-05-30-14\:02\:00/vmcore /usr/lib/debug/lib/modules/`uname -r`/vmlinux

0 comments:

Post a Comment