zabbix自动发现SSD/HDD并监控寿命及状态
2019-08-14 本文已影响0人
圣地亚哥_SVIP
背景:
由于部署的一批Ceph集群有使用SSD作为缓存盘。而SSD是有读写寿命的,所以需要监控此SSD的寿命。
需求:
已有zabbix的平台,能够自动发现SSD并注册对应的监控项及告警。
监测项:
- SSD寿命
- SSD状态
- HDD盘状态
要求:
能够自动发现及注册
操作步骤
以下罗列了一些监控磁盘使用的一些命令:
注: 需要硬盘支持并开启smart
安装包, smartmontools
检索盘符:
#lsscsi | grep "disk" | awk '{ print $NF }'
判断盘类型:
# smartctl -i /dev/sda
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-327.20.1.es2.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: SAMSUNG MZ7LM480HMHQ-00005
Serial Number: S2UJNX0K630474
LU WWN Device Id: 5 002538 c40af1275
Firmware Version: GXT5204Q
User Capacity: 480,103,981,056 bytes [480 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Aug 9 16:04:25 2019 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
关键字:Rotation Rate: Solid State Device
SSD寿命:
#smartctl -l devstat /dev/sda
SSD状态:
#smartctl -H /dev/sda
SMART overall-health self-assessment test result: PASSED/PROBLEM
HDD盘状态:
#smartctl -H /dev/sda
SMART Health Status: OK
zabbix配置自动发现及注册
自动发现的脚本(SSD/HDD), blk_discovery.py:
#!/usr/bin/env python
# Discovery block device.
# Usage: ./blk_discovery {type}
# type: ssd/hdd/all
# Example:
# ./blk_discovery ssd
# Return Json:
# {
# "data": [
# {
# "{#DEV}": /dev/sda,
# "{#DEVTYPE}": ssd
# },
# {
# "{#DEV}": /dev/sdb,
# "{#DEVTYPE}": ssd
# }
# ]
# }
import sys
import json
import commands
result = {}
blk_type = sys.argv[1]
def discovery_blk():
result["data"] = []
(status, output) = commands.getstatusoutput("lsscsi | grep 'disk' | awk '{ print $NF }'")
if status != 0:
return {}
devs = output.split('\n')
for dev in devs:
disk = {}
cmmd = "smartctl -i %s | grep 'Rotation Rate:' | awk -F':' '{ print $NF }'" % dev
(status, output) = commands.getstatusoutput(cmmd)
if status != 0:
continue
dev_type = output.strip().lower()
if dev_type == "solid state device" and (blk_type == "ssd" or blk_type == "all"):
disk["{#DEV}"] = dev
if blk_type == "all":
disk["{#DEVTYPE}"] = "ssd"
else:
disk["{#DEVTYPE}"] = blk_type
if dev_type != "solid state device" and (blk_type == "hdd" or blk_type == "all"):
disk["{#DEV}"] = dev
if blk_type == "all":
disk["{#DEVTYPE}"] = "hdd"
else:
disk["{#DEVTYPE}"] = blk_type
if len(disk) != 0:
result["data"].append(disk)
print json.dumps(result, sort_keys=True, indent=2)
discovery_blk()
监控SSD寿命,SSD/HDD状态的脚本,blk_parse.py:
#!/usr/bin/env python
# Parse Block Device Status
# Usage: ./blk_parse.py {dev} {feature}
# Example:
# ssd endurance:
# ./blk_parse.py /dev/sda endurance
# Return:
# - 34 # Which means SSD has consumed 34% life
# ssd/hdd status:
# ./blk_parse.py /dev/sda status
# Return:
# - UP(1),
# - Down(0)
import sys
import commands
key = sys.argv[1]
feature = sys.argv[2]
class BlkStatus():
UP = 1
Down = 0
def get_status(dev):
cmmd = "smartctl -H %s | grep -i 'health' | awk '{ print $NF }'" % dev
(status, output) = commands.getstatusoutput(cmmd)
if status != 0:
return ""
status = output.strip().upper()
if status == "OK" or status == "PASSED":
return BlkStatus.UP
return BlkStatus.Down
def get_endurance(dev):
cmmd = "smartctl -l devstat %s | grep 'Used Endurance' | awk '{ print $4 }'" % dev
(status, output) = commands.getstatusoutput(cmmd)
if status != 0:
return ""
return int(output)
def blk_parse():
result = ""
if feature == "endurance":
result = get_endurance(key)
elif feature == "status":
result = get_status(key)
else:
pass
print result
blk_parse()
在所有Ceph(agent)节点,拷贝上述文件至/etc/zabbix/script下。
#chmod +x /etc/zabbix/script/blk_discovery.py
#chmod +x /etc/zabbix/script/blk_parse.py
目录下/etc/zabbix/zabbix_agentd.d/,添加配置文件,blk-status.conf:
UserParameter=blk_discovery[*],sudo /etc/zabbix/script/blk_discovery.py $1
UserParameter=blk.status[*],sudo /etc/zabbix/script/blk_parse.py $1 $2
UserParameter=blk.hdd.status[*],sudo /etc/zabbix/script/blk_parse.py $1 "status"
注: 此处最后两个重复,是为了key值不同,否则无法在不同的自动发现策略中添加具有相同key的监控原型。
重启zabbix-agent:
#systemctl restart zabbix-agent
zabbix web管理平台配置LLD,平台已存在一个Ceph主机监控模板,LLD配置在此模板中:
- 配置SSD自动发现规则
data:image/s3,"s3://crabby-images/71f96/71f96ee6ec1606c2f05d88f13c00a29499f389b9" alt=""
- 配置监控项
data:image/s3,"s3://crabby-images/c0ee1/c0ee1bbd3997a35d0df39ac91d6d2e5b69d8e81e" alt=""
data:image/s3,"s3://crabby-images/ad033/ad033297e7ffa95dde0ddcaa72f58cb4ce9be87e" alt=""
- 配置触发器
data:image/s3,"s3://crabby-images/dd019/dd01905d1b33652c1e58b5105fc9dd6381d0f680" alt=""
data:image/s3,"s3://crabby-images/3d834/3d83422f1ca89782a8da028ad1f3b11363a92318" alt=""
- 配置图形
data:image/s3,"s3://crabby-images/34a73/34a7385a1d1ee9cd121192d9eaf72ae104264cc8" alt=""
- 配置HDD自动发现及注册
data:image/s3,"s3://crabby-images/b78fb/b78fb574e9d055035502a2aaf36ea8693ccd057d" alt=""
- HDD状态监控
data:image/s3,"s3://crabby-images/7d805/7d8053906382665a8e38300744260ab0f7e0e8ec" alt=""
- HDD触发器
如下是自动发现并自动注册的监控项,获取的最新数据:
data:image/s3,"s3://crabby-images/45dfc/45dfcf92202a92f6f58294381d64f1fb04be4d91" alt=""
如上,完成SSD及HDD盘在zabbix中的自动发现及监控。