问题描述
生产环境下有几台tomcat,但突然某个时候发现所有的请求都不能响应了,由于我们的web server使用的是nginx,会将请求反向到tomcat上,所以起初怀疑是nginx就没有收到请求,但查看日志后发现,nginx中大量出现499的返回,这说明问题还是出在tomcat上.
问题排查
首先我想到的是不是CPU跑满了,虽说CPU没有报警但还是本能的top命令看下系统负载,发现系统只有0.x的负载,cpu,内存消耗都是正常的.
由于CPU没有出现异常,所以应该不是GC出现了问题,但还是检查了下GC log,果然GC也没问题
此时必须让jstack上场了,果然在使用jstack后发现很多线程都是WAITING状态
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
|
"http-nio-127.0.0.1-801-exec-498" daemon prio= 10 tid= 0x00002ada7c14f800 nid= 0x16a6 waiting on condition [ 0x00002ada9c905000 ] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for < 0x00000007873e6990 > (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java: 186 ) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java: 2043 ) at org.apache.http.pool.PoolEntryFuture.await(PoolEntryFuture.java: 133 ) at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java: 282 ) at org.apache.http.pool.AbstractConnPool.access$ 000 (AbstractConnPool.java: 64 ) at org.apache.http.pool.AbstractConnPool$ 2. getPoolEntry(AbstractConnPool.java: 177 ) at org.apache.http.pool.AbstractConnPool$ 2. getPoolEntry(AbstractConnPool.java: 170 ) at org.apache.http.pool.PoolEntryFuture.get(PoolEntryFuture.java: 102 ) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java: 240 ) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$ 1. get(PoolingHttpClientConnectionManager.java: 227 ) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java: 173 ) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java: 195 ) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java: 85 ) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java: 108 ) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java: 186 ) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java: 82 ) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java: 106 ) at com.weimai.utils.HttpClientUtil.doGet(HttpClientUtil.java: 105 ) at com.weimai.utils.HttpClientUtil.doGet(HttpClientUtil.java: 87 ) at com.weimai.utils.WeiBoUtil.checkUser(WeiBoUtil.java: 214 ) at com.weimai.web.UserInfoController.newWeiboLogin(UserInfoController.java: 1223 ) at sun.reflect.GeneratedMethodAccessor390.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 43 ) at java.lang.reflect.Method.invoke(Method.java: 606 ) |
此时意识到问题应该出现http连接上,马上用netstat查看下801端口的连接状态,果然发现很多请求都是CLOSE_WAIT,这里简单解释下CLOSE_WAIT状态,如果我们的client程序处于CLOSE_WAIT状态的话,说明套接字是被动关闭的,整个流程应该是这样
因为如果是server端主动断掉当前连接的话,那么双方关闭这个TCP连接共需要四个packet
server -> FIN -> client
server <- ACK <- client
这时候server端处于FIN_WAIT_2状态,而我们的程序处于CLOSE_WAIT状态
server <- FIN <- client
这时client发送FIN给server,client就置为LAST_ACK状态。
server -> ACK -> client
server回应了ACK,那么client的套接字才会真正置为CLOSED状态
我们的请求处于CLOSE_WAIT状态,而不是LAST_ACK状态,说明还没有发FIN给server,那么很简单,去看HttpClientUtil中如何处理就知道了,果然在查看HttpClientUtil代码中发现对于非正常关闭的http连接没有做abort,补充完善好try catch finally块后问题得到解决.