Friday, January 23, 2015

High CPU usage, server was not accessible over ssh

■ Issue/Symptom : High load on server, not accessible over ssh
OS Environment : RHEL 5.5
■ Background Information  :
  • Infra was running test
  • Server was intermittently highly loaded
  • ssh was failing :
    • [usera@user01lxv ~]$ ssh 10.57XXX
    • Password:
    • Connection closed by 10.57.XXX
  • console shows "lockd: rejected NSM callback from 7f000001:30001" and sometimes NFS is not ok
Investigation :
  • iowait was very high and fluctuating.
  • All the cpu were busy to serve i/o bound operations
$ mpstat -P ALL 1

Linux 2.6.18-128.el5 (xxxxxxx) 11/19/2014
10:17:29 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
10:17:30 PM all 0.00 0.00 0.00 75.00 0.00 0.00 0.00 25.00 182.18
10:17:30 PM 0 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 182.18
10:17:30 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00
  • top had shown high load or no process took too much cpu
top - 22:18:03 up 50 days, 22:20, 4 users, load average: 25.19, 26.68, 30.74
Tasks: 235 total, 2 running, 231 sleeping, 0 stopped, 2 zombie
Cpu(s): 2.0%us, 0.8%sy, 0.0%ni, 0.0%id, 96.8%wa, 0.0%hi, 0.4%si, 0.0%st
Mem: 3866480k total, 2916884k used, 949596k free, 12440k buffers
Swap: 8385920k total, 498424k used, 7887496k free, 350200k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20841 cw 21 0 5606m 2.1g 5040 S 4.0 56.8 291:05.56 /opt/cw/jre/bin/java -Duser.timezone=America/Mexico_City -Xms2560m -Xmx2560m -XX:MaxPermSize=128m
  • Found that there were lot of "D" stated processes which didn't appear on nso-102, 101
$ ps aux |awk '{print $1 " " $8 " " $NF }'|grep D

USER STAT COMMAND
root D< [kjournald]
root Ds 0
root Ds /var/run/vmware-guestd.pid
nobody DN /usr/bin/log2mysql-nso-tomcat-writer
nobody DN /usr/bin/log2mysql-nso-tomcat-spooler
root D
  • In above output, system thread kjournald is also in D state which looked bad from kernel perspective. Journalling would have stopped.
■ Workaround Solution :
Shutdown VM and power on again.[D stated processes can't be killed unless system is rebooted]

Permanent Solution :
Shutdown VM and power on again. .[D stated processes can't be killed unless system is rebooted]
Root Cause Analysis :
  • IOwait was mainly taking place as there were high number of D stated processes.

No comments:

Post a Comment