iostat的相关介绍

2020-05-20 本文已影响0人 ArthurIsUsed

Linux的分析包括：
◇： CPU
◇：memory
◇：disk
◇：network
CPU排查使用的工具： top、vmstat、mpstat、pidstat、perf、sar
◇：CPU平均负载，如果1分钟、5分钟、15分钟都很高，说明过去10来分钟CPU压力一直在增加
◇：如果1分钟很高，5分、15分钟很低，说明此时CPU压力大，需要继续观察CPU负载情况
disk排查使用的工具： iostat
memory排查使用的工具： free、top、sar、vmstat、cachestat、cachetop

iostat的常用语法

iostat [-x] [-d] [-m] [ interval [ count ]]
◇： -m 表示以MB为单位
◇：-d 指定磁盘
◇：tps: Indicate the number of transfers per second that were issued to the device. A transfer is an I/O request to the device. Multiple logical requests can be combined into a single I/O request to the device. A transfer is of indeterminate size.
◇：MB_read/s: 每秒从该设备读取的数量
◇：MB_wrtn/s: 每秒从该设备写入的数量
◇：MB_read：读取的总量
◇：MB_wrtn：写入的总量

[root@KMVS-CENTOS ~]# iostat -d /dev/mapper/vg_kmvscentos-lv_home -m 2 2
Linux 2.6.32-358.el6.x86_64 (KMVS-CENTOS) 2018年12月14日 _x86_64_ (2 CPU)
Device:  tps   MB_read/s   MB_wrtn/s   MB_read   MB_wrtn
dm-2    0.00        0.00        0.00         9         0

Device:  tps   MB_read/s   MB_wrtn/s   MB_read   MB_wrtn
dm-2    0.00        0.00        0.00         0         0

iostat -x
◇：avgqu_sz: 平均请求队列，越短越好
◇：await：每个I/O请求的处理时间（I/O的响应时间），应低于5ms，大于10ms就要优化。await大于◇：SVctm，差值小，队列时间短，反之队列时间长，系统出问题
◇：SVctm：平均I/O服务时间，await远大于SVctm表明对流太长（等待）系统上运行的程序变慢
◇：%util： single_disk 80%表明0.8的时间在处理，0.2的时间空闲，这表明设备繁忙
◇：如果%util远大于80%，但是await小于5ms，说明disk正常，但已繁忙，也可能是有多块磁盘
ostat -c 2 2 查看CPU使用情况
◇：%user: Show the percentage of CPU utilization that occurred while executing at the user level (application).
◇：%nice: Show the percentage of CPU utilization that occurred while executing at the user level with nice priority.
◇：%system: Show the percentage of CPU utilization that occurred while executing at the system level (kernel).
◇：%iowait: Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
◇：%steal: Show the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.
◇：%idle: Show the percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.

arthur@learnning:~$ iostat -c 2 2
Linux 4.15.0-42-generic (learnning) 12/14/2018 _x86_64_ (1 CPU)
avg-cpu:   %user   %nice   %system   %iowait   %steal   %idle
            0.25    0.01       0.12     0.16     0.00   99.45


avg-cpu:   %user   %nice   %system   %iowait   %steal   %idle
            1.00    0.00      0.00      0.00     0.00   99.00

案例说明

vmstat 显示CPU、内存都很正常，si/so都为零，且free还有6G可用，CPU空闲99%
iostat -x -m 5 10查看，此时能正常访问
◇：avgrq-sz单位时间内请求的扇区数，512字节1个扇区，260的数值偏大。
◇：但是avgqu-sz平均队列长度短，正常
◇：svctm，平均服务时长百分比很低
◇：wait这一栏数值比较高。大致来说一万转的机械硬盘是8.38毫秒，包括寻道时间、旋转延迟、传输时间。
◇：w_wait，写等待高的时候20。*_wait小于5ms是正常的，高于10都是表明系统存在问题。
◇：综合这几组数据表明，就绪队列很长，都在排队，等着把CPU处理完的数据写入磁盘，但是需要写的太多了，w_wait时间达到20ms
◇：结合现有服务的情况，此服务器CPU8核，内存16G，足够。但是有5个tomcat、dubbo、maven、 nexus、MySQL、redis都在跑，而且只有一块磁盘，所以存在IO瓶颈。
◇：由于多个程序都等待写入磁盘后，CPU返回结果给client端，但是w_wait高，让用户感觉到访问页面卡顿，或者慢，甚至请求不到数据提示：网络请求超时，请稍后再试。

iostat的相关技巧

io请求越大，需要消耗的时间就会越长。对于块设备而言，时间分成2个部分：
◇：寻道
◇：读或写操作
avgrq-sz这个值也不是为所欲为的，它受内核参数的控制。这个值反应了用户的IO-Pattern。我们经常关心，用户过来的IO是大IO还是小IO，那么avgrq-sz反应了这个要素。它的含义是说，平均下来，这这段时间内，所有请求的平均大小，单位是扇区。
可以通过 avgqu-sz × ( r/s or w/s ) = rsec/s or wsec/s.也就是讲,读定速度是这个来决定

[root@izwz9d6vcg0qeb2kppr2udz ~]# cat /sys/block/vda/queue/max_sectors_kb
512

vmstat使用

vmstat
◇：procs
    ■：r: The number of runnable processes (running or waiting for run time)
    ■：b: The number of processes in uninterruptible sleep.
◇：memory
    ■：swpd: the amount of virtual memory used.
    ■：free: the amount of idle memory.
    ■：buff: the amount of memory used as buffers.
◇：swap
    ■：si: Amount of memory swapped in from disk (/s).
    ■：so: Amount of memory swapped to disk (/s).
    ■：si表示没秒从disk读入虚拟内存的大小，这个值大于0，说明物理内存不够用了，或者内存泄漏
◇：io
    ■：bi: Blocks received from a block device (blocks/s).
    ■：bo: Blocks sent to a block device (blocks/s).
◇：isystem
    ■：in: The number of interrupts per second, including the clock.
    ■：cs: The number of context switches per second.
◇：icpu
    ■：us: Time spent running non-kernel code. (user time, including nice time)
    ■：sy: Time spent running kernel code. (system time)
    ■：id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
    ■：wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.
    ■：st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.

arthur@learnning:~$ vmstat -w -S m 2 2
procs ----------memory-------------  --swap--  ---io---  --system-- ---------cpu-----------
r   b   swpd   free   buff   cache   si   so    bi   bo   in   cs    us   sy   id   wa   st
1   0     0    1025   107     808    0    0     18   16   31   117   0    0    99   0    0
0   0     0    1025   107     808    0    0     0    0    35   114   0    0    99   0    0

内存够用时，si=so=0 vmstat -S m 5 100查看si、so是否是0
si/so长期大于0，系统性能受到影响，I/O和CPU都被消耗
在线上环境查看free都比较小，swpd占了几十，但是查看si=so一直都是0，所以内存没有问题。当内存的需求大于RAM的数量，服务器启用了virtual-memory，通过virtual-memory，可以将RAM段转移到Swap Disk的特殊Disk段上。

mpstat

mpstat: multiprocessor statistics，报告CPU的统计信息
◇：mpstat -P all 5 10: 4核，显示每核的使用情况
◇：mpstat -u 5 10: 显示总的CPU使用情况
◇：当CPU升高是能用这个mpstat找出是用户、系统、I/O等待导致的CPU升高

pidstat

pidstat
◇：-d I/O情况与processor的对应
◇：-w 上下文切换
◇：-u 默认显示cpu信息
◇：-r 显示内存信息： pidstat -r 5 2 –humen
threads: CPU调度的基本单位
processors: 资源拥有的基本单位
上下文切换不超过10K都算正常
◇： 自愿：进程在等待资源发生了I/O瓶颈
◇： 非自愿：进程都在被调度，在争抢CPU资源

iostat的相关介绍

iostat的常用语法

iostat的相关技巧

vmstat使用

mpstat

pidstat

猜你喜欢

热点阅读