July 9, 2013

Cheat sheet for lock debugging in the Linux kernel

From time to time I'm getting a performance problems that requires identifying the lock in the Linux kernel that causes too much lock contention. The newer kernels are well equipped to help you find that lock. If you can do something that's another question.
The base documentation for that is in the kernel source under Documentation/locking/lockstat.txt. However due to the performance impact of all that tracing this usually is disabled in the distributions. For RHEL 6.4 there is a separate debug kernel that you need to install. Ensure that it's the default IPL/boot kernel or select in the IPL/boot menu.
SLES 11 is more difficult as this requires a  kernel rebuild with CONFIG_LOCK_STATS enabled. You need to contact SUSE service to get a kernel for your system.
If you have the system up with this enabled you should do the following:
  • echo 1 >/proc/sys/kernel/lock_stat
  • run your workload 
  • cat /proc/lock_stat > /tmp/lockreport.txt
  • echo 0 >/proc/sys/kernel/lock_stat
Usually you only need to take a look at the few top locks to find out what's going wrong.