故障定位:netapp(nfs)从云盘快照创建云盘失败Volum
2020-03-08 本文已影响0人
余亚飞
一背景
cinder对接了netapp(nfs),创建云盘和快照功能正常, 但是从云盘快照创建云盘失败,报错
Volume xxx could not be created on shares.
二 定位过程
- 从云盘快照创建云盘日志如下:
2020-03-06 12:28:57.875 2290124 INFO cinder.volume.flows.manager.create_volume [req-fd7cb0cc-5000-4629-a040-8353c9d780a7 6d612f674ec84cb28ed4a6a25b1e5d8a 92c2b6ef30d4446bbc0b6f4c1ef91775 - - -] Volume eadb07ab-869f-4abb-b617-06162cd71ae6: being created as snap with specification: {'status': u'creating', 'volume_size': 40, 'volume_name': 'volume-eadb07ab-869f-4abb-b617-06162cd71ae6', 'snapshot_id': '4a0f0f68-9998-4271-9fff-cfcb4646de33'}
2020-03-06 12:29:44.365 2290124 WARNING cinder.volume.drivers.netapp.dataontap.nfs_base [req-fd7cb0cc-5000-4629-a040-8353c9d780a7 6d612f674ec84cb28ed4a6a25b1e5d8a 92c2b6ef30d4446bbc0b6f4c1ef91775 - - -] Discover file retries exhausted.
2020-03-06 12:29:44.367 2290124 ERROR cinder.volume.drivers.netapp.dataontap.nfs_base [req-fd7cb0cc-5000-4629-a040-8353c9d780a7 6d612f674ec84cb28ed4a6a25b1e5d8a 92c2b6ef30d4446bbc0b6f4c1ef91775 - - -] Exception creating volume eadb07ab-869f-4abb-b617-06162cd71ae6 from source snapshot-4a0f0f68-9998-4271-9fff-cfcb4646de33 on share 172.190.68.60:/DEV_R1_8200_C01_SVM_SAS_vol1.
- 其中第三行日志是netapp的驱动(drivers.netapp.dataontap.nfs_base)报错,从失败的Traceback中看到报错信息:
VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Volume eadb07ab-869f-4abb-b617-06162cd71ae6 could not be created on shares.
- 根据Traceback提示
File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/netapp/dataontap/nfs_base.py", line 175
,
查看netapp驱动代码,是函数self._discover_file_till_timeout(path)返回了False导致的错误
def _clone_with_extension_check(self, source, destination_volume):
source_size = source['size']
source_id = source['id']
source_name = source['name']
destination_volume_size = destination_volume['size']
self._clone_backing_file_for_volume(source_name,
destination_volume['name'],
source_id)
path = self.local_path(destination_volume)
if self._discover_file_till_timeout(path):
self._set_rw_permissions(path)
if destination_volume_size != source_size:
try:
self.extend_volume(destination_volume,
destination_volume_size)
except Exception:
LOG.error(_LE("Resizing %s failed. Cleaning "
"volume."), destination_volume['name'])
self._cleanup_volume_on_failure(destination_volume)
raise exception.CinderException(
_("Resizing clone %s failed.")
% destination_volume['name'])
else:
raise exception.CinderException(_("NFS file %s not discovered.")
% destination_volume['name'])
- 继续查看
_discover_file_till_timeout
函数, 是找不到新创建的volume的path导致的,而日志中的打印(Discover file retries exhausted
.)也正好印证了这个结论。
def _discover_file_till_timeout(self, path, timeout=45):
"""Checks if file size at path is equal to size."""
# Sometimes nfs takes time to discover file
# Retrying in case any unexpected situation occurs
retry_seconds = timeout
sleep_interval = 2
while True:
if os.path.exists(path):
return True
else:
if retry_seconds <= 0:
LOG.warning(_LW('Discover file retries exhausted.'))
return False
else:
time.sleep(sleep_interval)
retry_seconds -= sleep_interval
-
但是登陆环境却发现在对应的路径下是存在云盘的
6.增加打印日志,环境上存在volume的path时, 代码执行os.path.exists返回False,手动执行却返回True。os.path.exists(path)
官网说明:
os.path.exists(path)¶
Return True if path refers to an existing path. Returns False for broken symbolic links. On some platforms, this function may return False if permission is not granted to execute os.stat() on the requested file, even if the path physically exists.
7.查看mount目录权限为666, 因此怀疑是代码执行时使用的cinder用户权限不足导致的
- 给netapp(nfs)挂载目录加可执行权限x后, 恢复正常。
三 对于linux权限
r(Read,读取,权限值为4):对文件而言,具有读取文件内容的权限;对目录来说,具有浏览目 录的权限。
w(Write,写入,权限值为2):对文件而言,具有新增、修改文件内容的权限;对目录来说,具有删除、移动目录内文件的权限。
x(eXecute,执行,权限值为1):对文件而言,具有执行文件的权限;对目录了来说该用户具有进入目录的权限。
关于权限的简单测试
- 切换到root用户,创建test目录,并且设置权限为666,没有可执行权限。
root@HP-Laptop:/home/root# mkdir test
root@HP-Laptop:/home/root# ll
总用量 12
drwxr-xr-x 3 root root 4096 3月 8 11:28 ./
drwxr-xr-x 4 root root 4096 3月 8 11:27 ../
drwxr-xr-x 2 root root 4096 3月 8 11:28 test/
root@HP-Laptop:/home/root# chmod 666 test/
root@HP-Laptop:/home/root# ll
总用量 12
drwxr-xr-x 3 root root 4096 3月 8 11:28 ./
drwxr-xr-x 4 root root 4096 3月 8 11:27 ../
drw-rw-rw- 2 root root 4096 3月 8 11:28 test/
root@HP-Laptop:/home/root# cd test
root@HP-Laptop:/home/root/test# touch test.py
root@HP-Laptop:/home/root/test# chmod 666 test.py
root@HP-Laptop:/home/root/test# ll
总用量 8
drwxr-xr-x 2 root root 4096 3月 8 11:28 ./
drwxr-xr-x 3 root root 4096 3月 8 11:28 ../
-rw-rw-rw- 1 root root 0 3月 8 11:28 test.py
- 切换到普通用户,
os.path.exists
由于权限不足返回False。
yu@HP-Laptop:~$ python
Python 2.7.17 (default, Nov 7 2019, 10:07:09)
[GCC 7.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> print(os.path.exists("/home/root/test/test.py"))
False
>>>
- 切换到root用户,给test目录751权限,增加可执行权限。
root@HP-Laptop:/home/root# chmod 751 test
root@HP-Laptop:/home/root# ll
总用量 12
drwxr-xr-x 3 root root 4096 3月 8 11:32 ./
drwxr-xr-x 4 root root 4096 3月 8 11:27 ../
drwxr-x--x 2 root root 4096 3月 8 11:28 test/
-rw-r--r-- 1 root root 0 3月 8 11:32 test.py
- 切换到普通用户,
os.path.exists
返回了True。
yu@HP-Laptop:~$ python
Python 2.7.17 (default, Nov 7 2019, 10:07:09)
[GCC 7.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> print(os.path.exists("/home/root/test/test.py"))
True
>>>