是五月呀!

cpu过高

一个应用占用CPU很高,除了确实是计算密集型应用之外,通常原因都是出现了死循环。

总体流程:
首先显示java进程的线程列表,查找运行时间长的线程:

1
ps -mp pid -o THREAD,tid,time

或者使用top命令的线程模式:

1
top -p pid

然后按’H’,进入线程模式,第一列PID为tid(线程id)

其次将需要的线程ID转换为16进制格式:

1
printf "%x\n" tid

最后打印线程的堆栈信息:

1
jstack pid |grep tid -A 30

举个例子说明下:

1. 查找进程号

1
2
$ jps -v | grep -v Jps
43109 Bootstrap ...

使用jps命令显示当前的java进程,加上-v参数可以显示进程的详情信息,更好的判断是否是关心的进程。

2. 查找耗时的线程号

一个进程下包含很多线程,我们关心的是那些线程运行了很长时间还没有结束,导致资源一直被占用。
第一种方法是使用ps命令:

1
2
3
4
5
6
7
8
9
10
11
$ ps -mp 43109 -o THREAD,tid,time | sort -r -k9 | head -10
USER %CPU PRI SCNT WCHAN USER SYSTEM TID TIME
appops 14.7 - - - - - - 03:05:55
appops 8.0 19 - - - - 43422 01:41:26 ------------这个线程执行了01:41:26
appops 0.2 19 - - - - 43411 00:03:44
appops 0.2 19 - - - - 43415 00:03:30
appops 0.2 19 - - - - 43414 00:03:22
appops 0.2 19 - - - - 43424 00:03:20
appops 0.2 19 - - - - 43408 00:03:10
appops 0.2 19 - - - - 43418 00:03:07
appops 0.2 19 - - - - 43407 00:03:06

其中ps命令查询进程的执行信息,-o命令指定输出信息列,包含线程tid和线程执行时间。然后使用sort命令按时间倒叙排序,取前10行。

第二种方法是使用top命令:
首先输入

1
2
3
4
5
6
7
8
9
$ top -p 43109
top - 11:44:11 up 519 days, 23:37, 1 user, load average: 0.27, 0.28, 0.31
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.4 us, 0.9 sy, 0.0 ni, 97.6 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem: 12330736 total, 11692416 used, 638320 free, 79176 buffers
KiB Swap: 0 total, 0 used, 0 free, 6568140 cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
43109 appops 20 0 5225m 3.7g 11m S 16.0 31.4 187:28.17 java

进入单个进程信息后,输入H,进入线程模式:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
top - 11:47:16 up 519 days, 23:40, 1 user, load average: 0.08, 0.20, 0.28
Threads: 124 total, 0 running, 124 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.7 us, 1.0 sy, 0.0 ni, 97.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 12330736 total, 11694844 used, 635892 free, 79176 buffers
KiB Swap: 0 total, 0 used, 0 free, 6568872 cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
43422 appops 20 0 5225m 3.7g 11m S 12.3 31.4 103:00.04 java
43418 appops 20 0 5225m 3.7g 11m S 0.3 31.4 3:07.64 java
43424 appops 20 0 5225m 3.7g 11m S 0.3 31.4 3:20.64 java
43442 appops 20 0 5225m 3.7g 11m S 0.3 31.4 0:48.77 java
43109 appops 20 0 5225m 3.7g 11m S 0.0 31.4 0:00.00 java
43110 appops 20 0 5225m 3.7g 11m S 0.0 31.4 0:01.38 java
43111 appops 20 0 5225m 3.7g 11m S 0.0 31.4 0:01.73 java
43112 appops 20 0 5225m 3.7g 11m S 0.0 31.4 0:01.85 java
43113 appops 20 0 5225m 3.7g 11m S 0.0 31.4 0:01.82 java
43114 appops 20 0 5225m 3.7g 11m S 0.0 31.4 0:01.83 java
43115 appops 20 0 5225m 3.7g 11m S 0.0 31.4 0:01.75 java
43116 appops 20 0 5225m 3.7g 11m S 0.0 31.4 0:01.75 java
43117 appops 20 0 5225m 3.7g 11m S 0.0 31.4 0:01.77 java
43118 appops 20 0 5225m 3.7g 11m S 0.0 31.4 0:01.77 java
43119 appops 20 0 5225m 3.7g 11m S 0.0 31.4 0:14.83 java
43120 appops 20 0 5225m 3.7g 11m S 0.0 31.4 0:00.53 java

接着输入T,表示按时间倒叙排序:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
top - 11:47:54 up 519 days, 23:41, 1 user, load average: 0.17, 0.20, 0.28
Threads: 124 total, 0 running, 124 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.6 us, 0.7 sy, 0.0 ni, 97.6 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 12330736 total, 11696052 used, 634684 free, 79176 buffers
KiB Swap: 0 total, 0 used, 0 free, 6569020 cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
43422 appops 20 0 5225m 3.7g 11m S 13.0 31.4 103:04.72 java
43411 appops 20 0 5225m 3.7g 11m S 0.0 31.4 3:45.74 java
43415 appops 20 0 5225m 3.7g 11m S 0.0 31.4 3:31.93 java
43414 appops 20 0 5225m 3.7g 11m S 0.0 31.4 3:26.41 java
43424 appops 20 0 5225m 3.7g 11m S 0.0 31.4 3:20.64 java
43408 appops 20 0 5225m 3.7g 11m S 0.0 31.4 3:12.97 java
43418 appops 20 0 5225m 3.7g 11m S 0.0 31.4 3:07.73 java
43407 appops 20 0 5225m 3.7g 11m S 0.0 31.4 3:06.85 java
43412 appops 20 0 5225m 3.7g 11m S 0.0 31.4 3:01.83 java
43419 appops 20 0 5225m 3.7g 11m S 0.0 31.4 3:01.40 java
43438 appops 20 0 5225m 3.7g 11m S 0.0 31.4 3:00.63 java
43437 appops 20 0 5225m 3.7g 11m S 0.0 31.4 2:59.67 java
43439 appops 20 0 5225m 3.7g 11m S 0.0 31.4 2:59.61 java
43413 appops 20 0 5225m 3.7g 11m S 0.0 31.4 2:55.66 java
43421 appops 20 0 5225m 3.7g 11m S 0.0 31.4 2:51.55 java

注:
M %MEM
N PID
P %CPU
T TIME+

同样能找到耗时较多的线程号:43422。

3. 将线程ID转换为16进制格式

由于jstack命令输出的线程id是使用16进制格式,所以我们要将tid转化为16进制进行搜索。

1
2
$ printf "%x\n" 43422
a99e

4. 在进程堆栈日志中查找

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ jstack 43109 |grep a99e -A 30
"org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-17" prio=10 tid=0x00007fe889c95800 nid=0xa99e runnable [0x00007fe8878f6000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.net.SocketInputStream.read(SocketInputStream.java:108)
at redis.clients.util.RedisInputStream.ensureFill(RedisInputStream.java:195)
at redis.clients.util.RedisInputStream.readByte(RedisInputStream.java:40)
at redis.clients.jedis.Protocol.process(Protocol.java:128)
at redis.clients.jedis.Protocol.read(Protocol.java:192)
at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:282)
at redis.clients.jedis.Connection.getBinaryBulkReply(Connection.java:201)
at redis.clients.jedis.BinaryJedis.get(BinaryJedis.java:127)
...

这样能看到具体线程执行情况,以及长时间执行的原因。