记一次Linux的性能排查
服务器有6台腾讯云的机器。有一天无意随便登录一台使用vmstat命令查看CPU和内存的消耗情况:
[root@VM_26_210_centos ~]# vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 41572 229160 399080 3666708 0 0 0 10 0 0 1 0 99 0 0
[root@VM_26_210_centos ~]# vmstat 2 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 41572 230096 399080 3666820 0 0 0 10 0 0 1 0 99 0 0
[root@VM_26_210_centos ~]# vmstat 2
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 41572 229880 399080 3666840 0 0 0 10 0 0 1 0 99 0 0
0 0 41572 229748 399080 3666840 0 0 0 28 791 1221 1 0 99 0 0
0 0 41572 229616 399080 3666840 0 0 0 0 895 1305 1 1 98 0 0
0 0 41572 229368 399080 3666840 0 0 0 4542 801 1294 0 0 98 1 0
0 0 41572 229376 399080 3666848 0 0 0 20 811 1251 1 1 99 0 0
0 0 41572 229384 399080 3666848 0 0 0 0 745 1206 0 1 99 0 0
0 0 41572 229376 399080 3666848 0 0 0 110 831 1298 1 0 99 0 0
0 0 41572 229616 399080 3666852 0 0 0 0 1741 2634 2 1 97 0 0
0 0 41572 229624 399080 3666852 0 0 0 4 769 1255 1 0 99 0 0
吓了我一跳:服务器是4核8G的内存。vmstat一看只有两百多兆了。说明内存已经不够。
然后腾讯云上的监控是这样的:
腾讯云.jpg腾讯云监控显示的内存竟然是只使用了50%,这个时候我就很奇怪了。肯定是哪里有问题,于是我使用top命令查看了当前机器的状态:
[root@VM_26_210_centos ~]# top
top - 13:32:02 up 659 days, 3:06, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 136 total, 1 running, 135 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.9%us, 1.1%sy, 0.0%ni, 97.8%id, 0.2%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8059448k total, 7826428k used, 233020k free, 399080k buffers
Swap: 2097144k total, 41572k used, 2055572k free, 3668692k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22539 root 20 0 8318m 1.8g 14m S 1.7 23.9 452:18.77 java
20117 root 20 0 7168 6332 660 S 0.7 0.1 179:56.95 sap1002
873 root 20 0 246m 5476 812 S 0.3 0.1 6:46.02 rsyslogd
10618 root 20 0 37868 17m 984 S 0.3 0.2 153:26.54 secu-tcs-agent
14448 root 20 0 5590m 477m 12m S 0.3 6.1 107:43.34 java
16980 root 20 0 39016 22m 5576 S 0.3 0.3 293:10.41 sap1009
17857 root 20 0 4384m 452m 11m S 0.3 5.8 538:17.84 java
22349 root 20 0 5569m 467m 12m S 0.3 5.9 84:30.18 java
27931 root 20 0 427m 13m 2084 S 0.3 0.2 325:52.53 barad_agent
29121 root 20 0 33468 15m 1052 S 0.3 0.2 83:37.27 sap1005
1 root 20 0 19356 932 716 S 0.0 0.0 2:21.78 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 3:29.40 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 4:45.83 ksoftirqd/0
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
6 root RT 0 0 0 0 S 0.0 0.0 1:13.64 watchdog/0
7 root RT 0 0 0 0 S 0.0 0.0 3:21.08 migration/1
8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1
9 root 20 0 0 0 0 S 0.0 0.0 4:10.62 ksoftirqd/1
10 root RT 0 0 0 0 S 0.0 0.0 0:58.95 watchdog/1
11 root RT 0 0 0 0 S 0.0 0.0 3:07.85 migration/2
12 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/2
13 root 20 0 0 0 0 S 0.0 0.0 4:19.19 ksoftirqd/2
14 root RT 0 0 0 0 S 0.0 0.0 1:00.61 watchdog/2
15 root RT 0 0 0 0 S 0.0 0.0 3:06.14 migration/3
16 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/3
17 root 20 0 0 0 0 S 0.0 0.0 5:30.66 ksoftirqd/3
18 root RT 0 0 0 0 S 0.0 0.0 1:00.14 watchdog/3
19 root 20 0 0 0 0 S 0.0 0.0 26:36.90 events/0
20 root 20 0 0 0 0 S 0.0 0.0 26:37.71 events/1
21 root 20 0 0 0 0 S 0.0 0.0 33:52.49 events/2
22 root 20 0 0 0 0 S 0.0 0.0 37:57.76 events/3
23 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cgroup
24 root 20 0 0 0 0 S 0.0 0.0 0:11.72 khelper
mem 行显示还是只有两百多兆的剩余内存。然后只查看内存:
[root@VM_26_210_centos ~]# free -m
total used free shared buffers cached
Mem: 7870 7625 245 0 389 3575
-/+ buffers/cache: 3660 4210
Swap: 2047 40 2007
这下确定了,肯定是腾讯云的监控使用问题的。
于是打电话给腾讯云。折腾了一下午,腾讯云反馈说他们的内存计算是不计算 buffer 和 cache的。
那么在vmstat中,buffer和cache到底是什么呢?
这里我直接引用http://www.cnblogs.com/chenshoubiao/p/4796664.html这篇博客:
A buffer is something that has yet to be "written" to disk.
A cache is something that has been "read" from the disk and stored for later use.
也就是说buffer是用于存放要输出到disk(块设备)的数据的,而cache是存放从disk上读入的数据。这二者是为了提高IO性能的,并由OS管理。
那么在vmstat中,用于输出的缓存的大概是三百多M,从硬盘读入的数据是则是3个多G。
那么真正被使用的内存就是差不多4个G作用。统计一下top命令中RES的和,是3.5个G。
这个时候就担心两个问题了:
- 1.为什么有这么大的 cache?对性能有什么影响呢?
- 2.只有两百多m的free,影响JVM的性能吗?
从 vmstat来看,si (每秒从磁盘读入虚拟内存的大小,如果这个值大于0,表示物理内存不够用或者内存泄露了,要查找耗内存进程解决掉。我的机器内存充裕,一切正常)与
so (每秒虚拟内存写入磁盘的大小,如果这个值大于0,同上。)都是正常的。就是说没有发生分页交换,JVM在垃圾回收的时候要扫描所有的堆,如果发生分页交换,JVM回收垃圾的性能就会大大下降。
对比每两秒输出GC情况,通过 jstat命令来看垃圾回收也是正常的:
[root@VM_26_210_centos ~]# jstat -gccause 22539 2000
S0 S1 E O M CCS YGC YGCT FGC FGCT GCT LGCC GCC
87.12 0.00 91.04 57.69 95.37 91.61 16912 1148.004 5 1.371 1149.375 Allocation Failure No GC
87.12 0.00 91.39 57.69 95.37 91.61 16912 1148.004 5 1.371 1149.375 Allocation Failure No GC
87.12 0.00 92.45 57.69 95.37 91.61 16912 1148.004 5 1.371 1149.375 Allocation Failure No GC
87.12 0.00 92.57 57.69 95.37 91.61 16912 1148.004 5 1.371 1149.375 Allocation Failure No GC
87.12 0.00 92.58 57.69 95.37 91.61 16912 1148.004 5 1.371 1149.375 Allocation Failure No GC
87.12 0.00 92.65 57.69 95.37 91.61 16912 1148.004 5 1.371 1149.375 Allocation Failure No GC
87.12 0.00 92.83 57.69 95.37 91.61 16912 1148.004 5 1.371 1149.375 Allocation Failure No GC
87.12 0.00 92.84 57.69 95.37 91.61 16912 1148.004 5 1.371 1149.375 Allocation Failure No GC
87.12 0.00 92.84 57.69 95.37 91.61 16912 1148.004 5 1.371 1149.375 Allocation Failure No GC
87.12 0.00 92.86 57.69 95.37 91.61 16912 1148.004 5 1.371 1149.375 Allocation Failure No GC
87.12 0.00 92.92 57.69 95.37 91.61 16912 1148.004 5 1.371 1149.375 Allocation Failure No GC
87.12 0.00 93.07 57.69 95.37 91.61 16912 1148.004 5 1.371 1149.375 Allocation Failure No GC
87.12 0.00 93.08 57.69 95.37 91.61 16912 1148.004 5 1.371 1149.375 Allocation Failure No GC
87.12 0.00 93.09 57.69 95.37 91.61 16912 1148.004 5 1.371 1149.375 Allocation Failure No GC
87.12 0.00 93.58 57.69 95.37 91.61 16912 1148.004 5 1.371 1149.375 Allocation Failure No GC
87.12 0.00 93.65 57.69 95.37 91.61 16912 1148.004 5 1.371 1149.375 Allocation Failure No GC
87.12 0.00 94.36 57.69 95.37 91.61 16912 1148.004 5 1.371 1149.375 Allocation Failure No GC
87.12 0.00 94.37 57.69 95.37 91.61 16912 1148.004 5 1.371 1149.375 Allocation Failure No GC
0.00 84.94 40.97 57.69 95.37 91.61 16913 1148.072 5 1.371 1149.443 Allocation Failure No GC
http://www.cnblogs.com/kevingrace/p/5991604.html
http://blog.sina.com.cn/s/blog_9c6f23fb0102x1fg.html
从操作系统来讲,影响JVM性能有哪些因素?
1.页面交换
2.上下文切换