Linux: Initiating a CPU backtrace

Sometimes a process is taking a lot of CPU time and it isn’t clear what the cause is.

For example, at times it is common to see the kworker process consuming a lot of CPU. kworker is a placeholder process for kernel worker threads. These threads perform most of the actual processing for the kernel and you might want to see what device is involved.

You can run a CPU backtrace in Linux that records in dmesg what each one of the CPUs in the system are doing. This can be very useful to determine what specific process is hogging the CPU and in some cases to which module/driver it is related.

This is done using the magic SysRq key. This is a key combination that allows you to communicate directly with the kernel and perform several low level commands regardless of the state of the system. An exception would be a kernel panic, for obvious reasons.

When the magic SysRq key is enabled you can use the key combination Alt+SysRq+command key. There are many options and worth writing a future article just on it. The Wikipedia article explains some of them.

Because the magic SysRq key provides direct access to the kernel and the deep security implications of this it is disabled by default; if not in all, in most distros.

When the magic SysRq key is disabled and a backtrace is requested dmesg will not display any backtrace. So you can temporarily activate the magic SysRq key with:

# sysctl -w kernel.sysrq=1

or

# echo 1 > /proc/sys/kernel/sysrq

Be aware that this won’t persist between reboots. To turn it off you use:

# sysctl -w kernel.sysrq=0

or

# echo 0 > /proc/sys/kernel/sysrq

You can generate the backtrace with the Alt-SysRq-L key combination . If you need to script the backtrace or are accessing the system remotely, there is a CLI alternative to do the same:

# echo l > /proc/sysrq-trigger

And you can then check the results with:

$ dmesg

[ 3966.375451] Call Trace:
[ 3966.375463]  dump_stack+0x63/0x8b
[ 3966.375468]  nmi_cpu_backtrace+0x94/0xa0
[ 3966.375473]  ? lapic_can_unplug_cpu+0xb0/0xb0
[ 3966.375478]  nmi_trigger_cpumask_backtrace+0xe6/0x130
[ 3966.375482]  arch_trigger_cpumask_backtrace+0x19/0x20
[ 3966.375487]  sysrq_handle_showallcpus+0x17/0x20
[ 3966.375491]  __handle_sysrq+0x9f/0x170
[ 3966.375495]  write_sysrq_trigger+0x34/0x40
[ 3966.375500]  proc_reg_write+0x45/0x70
[ 3966.375504]  __vfs_write+0x1b/0x40
[ 3966.375508]  vfs_write+0xb1/0x1a0
[ 3966.375511]  SyS_write+0x55/0xc0
[ 3966.375517]  do_syscall_64+0x73/0x130
[ 3966.375521]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 3966.375525] RIP: 0033:0x7feca7544154
[ 3966.375528] RSP: 002b:00007ffedf5764b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 3966.375532] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007feca7544154
[ 3966.375535] RDX: 0000000000000002 RSI: 000055c1c5642320 RDI: 0000000000000001
[ 3966.375537] RBP: 000055c1c5642320 R08: 000000000000000a R09: 0000000000000001
[ 3966.375539] R10: 000000000000000a R11: 0000000000000246 R12: 00007feca7820760
[ 3966.375542] R13: 0000000000000002 R14: 00007feca781c2a0 R15: 00007feca781b760
[ 3966.375547] Sending NMI from CPU 5 to CPUs 0-4,6-15:
[ 3966.375569] NMI backtrace for cpu 13 skipped: idling at acpi_idle_do_entry+0x19/0x40
[ 3966.375574] NMI backtrace for cpu 12 skipped: idling at acpi_idle_do_entry+0x19/0x40
[...]

The above example is from an idle system, but if the system was busier it would display more activity from other CPUs, system calls and what driver/module is involved.

If the same module, driver or hardware device keeps showing up in the backtraces you should check them as possible source of the high CPU utilisation.