【Ovirt 笔记】engine-log-collector 的

2018-05-18 本文已影响41人 58bc06151329

文前说明

作为码农中的一员，需要不断的学习，我工作之余将一些分析总结和学习笔记写成博客与大家一起交流，也希望采用这种方式记录自己的学习之旅。

本文仅供学习交流使用，侵权必删。
不用于商业目的，转载请注明出处。

分析整理的版本为 Ovirt 3.4.5 版本。

命令使用方式：
engine-log-collector [options] list 显示主机列表
engine-log-collector [options] collect 诊断报告收集。

选项组	说明
--version	显示程序的版本号
-h，--help	显示帮助信息
--quiet	控制台简洁输出（默认 false）
--local-tmp=	临时存储目录，目录随机生成，例如：/tmp/LogCultReXCFDV5
--ticket-number=	ticket ID
--upload=	将报告上传红帽（选择红帽支持的列表）
--log-file=PATH	日志文件路径（默认为 /var/log/ovirt-engine/ovirt-log-collector/ovirt-log-collector-yyyyMMddHHmmss.log）
--conf-file=PATH	配置文件路径（默认为 /etc/ovirt-engine/logcollector.conf）
--cert-file=PATH	CA 证书用来验证引擎（默认为 /etc/pki/ovirt-engine/ca.pem）
--insecure	不验证引擎（默认 off）
--output=	将要存储报表的目标目录

engine 配置，针对引擎 restApi 的授权和针对一个或者多个主机筛选日志集合，如果设置了 --no-
hypervisors，将不会从任何主机收集数据。

engine 配置组	说明
--no-hypervisors	跳过来自主机（Node）的收集，默认为 false
-u，--user=	restApi 用户，例如：user@engine.example.com，默认 admin@internal
-r，--engine=	restApi IP 地址，例如：localhost:443
-c，--cluster=	添加群集过滤器列表（逗号隔开群集名称或正则），默认为 None
-d，--data-center=	添加数据中心过滤器列表（逗号隔开数据中心名称或者正则），默认为 None
-H，--hosts=	添加主机过滤器列表（逗号分隔主机名、FQDN、IP 地址或正则），默认为 None

连接配置组	说明
--ssh-port=	SSH 连接接口
-k，--key-file=	SSH Key 身份文件（私钥）用于访问文件服务器。
--max-connections=	获取日志的最大并发连接数（默认为 10）

PostgreSQL 配置，可以指定数据库连接配置，连接到数据库收集相关日志，如果设置了 --no-postgresql，将跳过数据库连接。

PostgreSQL 数据库配置组	说明
--no-postgresql	跳过PostgreSQL 数据库的收集
--pg-user=engine	PostgreSQL 数据库用户名称（默认为 engine）
--pg-dbname=engine	PostgreSQL 数据库名称（默认为 engine）
--pg-dbhost=localhost	PostgreSQL 数据库连接地址（默认为 localhost）
--pg-dbport=5432	PostgreSQL 数据库连接端口（默认为 5432）
--pg-ssh-user=root	通过 SSH 用户远程连接 PostgreSQL 数据库（默认为 root）
--pg-host-key=none	使用身份文件（私钥）访问 PostgreSQL 数据库（默认如果是本机则不需要）

此命令将收集系统日志信息，用于系统配置和诊断。

命令采用了 python 方式进行实现。

optparse 模块

engine-image-uploader.sh 中使用了 optparse 模块，这是一个专门用来在命令行添加选项的一个模块。
代码示例

from optparse import OptionParser
parser = OptionParser(...)
parser.add_option(.....)

OptionParser 命令参数
- 不要求一定要传递参数

参数	说明
usage	可以打印用法。
version	在使用 %prog --version 的时候输出版本信息。
description	描述信息

add_option 添加命令行参数

参数	说明
action	指示 optparser 解析参数时候该如何处理。默认是 ' store ' 将命令行参数值保存 options 对象里。action 的值有 store、store_true、store_false、store_const、append、count、callback。
type	默认是 string，也可以是 int、float 等。
dest	如果没有指定 dest 参数，将用命令行参数名来对 options 对象的值进行存取。
store	store 可以为 store_true 和 store_false 两种形式。用于处理命令行参数后面不带值的情况。如 -v、-q 等命令行参数。
default	设置默认值。
help	指定帮助文档。
metavar	提示用户期望参数。

parse_args 解析命令行形参
- （options, args) = parser.parse_args() 可以传递一个参数列表给 parse_args()。否则，默认使用命令行参数 (sysargv[1:])。
- parse_args() 返回两个值
  - options 这是一个对象（optpars.Values），保存有命令行参数值。只要知道命令行参数名，如 file，就可以访问其对应的值 options.file。
  - args，一个由 positional arguments 组成的列表。
如果 options 很多的时候，可以进行分组

group = OptionGroup(parser)
group.add_option()
parser.add_option_group(group)

shutil 模块

engine-image-uploader.sh 中使用了 shutil 模块，这是一个高级的文件、文件夹、压缩包处理模块。

命令	说明
shutil.copyfileobj(fsrc, fdst[, length])	将文件内容拷贝到另一个文件中
shutil.copyfile(src, dst)	拷贝文件
shutil.copy(src, dst)	拷贝文件和权限
shutil.copy2(src, dst)	拷贝文件和状态信息
shutil.copymode(src, dst)	仅拷贝权限。内容、组、用户均不变
shutil.copystat(src, dst)	仅拷贝状态的信息，即文件属性，包括：mode bits, atime, mtime, flags
shutil.ignore_patterns(*patterns)	忽略哪个文件，有选择性的拷贝
shutil.copytree(src, dst, symlinks=False, ignore=None)	递归的去拷贝文件夹
shutil.rmtree(path[, ignore_errors[, onerror]])	递归的去删除文件
shutil.move(src, dst)	递归的去移动文件，它类似 mv 命令，其实就是重命名。
shutil.make_archive(base_name, format,...)	创建压缩包并返回文件路径，例如：zip、tar

sosreport 诊断报告工具

sosreport 是一个类似于 supportconfig 的生成诊断报告的工具，sosreport 是 python 编写的一个工具，适用于 centos（和 redhat一样，包名为 sos）、ubuntu（其下包名为 sosreport）等大多数版本的 linux。
sosreport 在 github上的托管页面为 https://github.com/sosreport/sos ，而且默认在很多系统的源里都已经集成有。
redhat 一般也会通过 sosreport 将收集的信息进行分析查看。redhat 4.5 之前的版本中叫sysreport。通过以下命令可以安装。

yum -y insatll sos

sosreport 命令的使用方式：Usage: sosreport [options]

选项	说明
-h，--help	显示帮助信息
-l，--list-plugins	显示插件和可用插件选项列表
-n，--skip-plugins=	设置忽略插件
-e，--enable-plugins=	启用插件
-o，--only-plugins=	仅启用插件
-k	设置插件参数（格式 plugname.option=value format），格式也可以通过 -l 查看
-a，--alloptions	启动加载插件的所有选项
-u，--upload=	将报告上传到 FTP 服务器
--batch	不询问任何问题（批处理模式）
--build	保持 SOS 树可用，不返回结果
--no-colors	不使用终端的文本颜色
--debug	通过 python 调试器启用调试
--ticket-number=	设置 ticket ID
-name=	自定义客户名称
--config-file=	指定备用配置文件
--tmp-dir=	指定备用临时目录
--diagnose	启用诊断
--analyze	启用分析
--report	启用 HTML/XML 报告生成
--profile	打开剖面图

[root@localhost ~]# /usr/sbin/sosreport --list-plugins

sosreport (version 2.2)

The following plugins are currently enabled:

 acpid           acpid related information
 activemq        ActiveMQ related information
 anaconda        Anaconda / Installation information
 apache          Apache related information
 auditd          Auditd related information
 bootloader      Bootloader information
 cgroups         cgroup subsystem information
 crontab         Crontab information
 ctdb            Samba CTDB related information
 devicemapper    device-mapper related information (dm, lvm, multipath)
 distupgrade     Distribution upgrade information
 dovecot         dovecot server related information
 filesys         information on filesystems
 foreman         Foreman related information
 gdm             gdm related information
 general         basic system information
 gluster         gluster related information
 haproxy         haproxy information
 hardware        hardware related information
 hpasm           HP ASM (hp Server Management Drivers and Agent) information
 hts             Red Hat Hardware Test Suite related information
 i18n            i18n related information
 ipvs            Ipvs information
 iscsi           iscsi-initiator related information
 keepalived      Keepalived information
 kernel          kernel related information
 krb5            Samba related information
 ldap            LDAP related information
 libraries       information on shared libraries
 libvirt         libvirt-related information
 logrotate       logrotate configuration files and debug info
 lsbrelease      Linux Standard Base information
 memory          memory usage information
 mongodb         MongoDB related information
 mrggrid         MRG GRID related information
 mrgmessg        MRG Messaging related information
 mysql           MySQL related information
 networking      network related information
 nfs             NFS related information
 nfsserver       NFS server-related information
 ntp             NTP related information
 openhpi         OpenHPI related information
 openshift       Openshift related information
 openssl         openssl related information
 pam             PAM related information
 pgsql           PostgreSQL related information
 postfix         mail server related information
 postgresql      PostgreSQL related information
 powerpc         IBM Power System related information
 printing        printing related information (cups)
 process         process information
 psacct          Process accounting related information
 rpm             RPM information
 samba           Samba related information
 selinux         selinux related information
 ssh             ssh-related information
 startup         startup information
 sunrpc          Sun RPC related information
 system          core system related information
 tomcat          Tomcat related information
 udev            udev related information
 x11             X related information
 xen             Xen related information
 yum             yum information

The following plugins are currently disabled:

 amd               Amd automounter information
 autofs            autofs server-related information
 cloudforms        CloudForms related information
 cluster           cluster suite and GFS related information
 cobbler           cobbler related information
 corosync          corosync information
 cs                Certificate System 7.x Diagnostic Information
 dhcp              DHCP related information
 ds                Directory Server information
 emc               EMC related information (PowerPath, Solutions Enabler CLI and Navisphere CLI)
 ftp               FTP server related information
 infiniband        Infiniband related information
 initrd            initrd related information
 ipa               IPA diagnostic information
 ipsec             ipsec related information
 iscsitarget       iscsi-target related information
 kdump             Kdump related information
 kernel_realtime   Information specific to the realtime kernel
 kvm               KVM related information
 named             named related information
 netdump           Netdump Configuration Information
 nscd              NSCD related information
 oddjob            oddjob related information
 openswan          ipsec related information
 ovirt             oVirt related information
 ppp               ppp, wvdial and rp-pppoe related information
 pxe               PXE related information
 qpidd             Messaging related information
 quagga            quagga related information
 radius            radius related information
 rhn               RHN Satellite related information
 rhui              Red Hat Update Infrastructure for Cloud Providers
 s390              s390 related information
 sanitize          sanitize specified log files, etc
 sanlock           sanlock-related information
 sar               Generate the sar file from /var/log/sa/saXX files
 sendmail          sendmail information
 smartcard         Smart Card related information
 snmp              snmp related information
 soundcard         Sound card information
 squid             squid related information
 sssd              sssd-related Diagnostic Information
 systemtap         SystemTap information
 tftpserver        tftpserver related information
 veritas           veritas related information
 vmware            VMWare related information
 xinetd            xinetd information

The following plugin options are available:

 apache.log            off gathers all apache logs
 auditd.logsize        15 max size (MiB) to collect per syslog file
 auditd.all_logs       off collect all logs regardless of size
 devicemapper.lvmdump  off collect an lvmdump
 devicemapper.lvmdump-am off attempt to collect an lvmdump with advanced options and raw metadata collection
 filesys.dumpe2fs      off dump full filesystem information
 general.syslogsize    15 max size (MiB) to collect per syslog file
 general.all_logs      off collect all log files defined in syslog.conf
 gluster.logsize       5 max log size (MiB) to collect
 gluster.all_logs      off collect all log files present
 kernel.modinfo        on gathers information on all kernel modules
 libraries.ldconfigv   off the name of each directory as it is scanned, and any links that are created.
 mysql.dbuser          mysql username for database dumps
 mysql.dbpass                password for database dumps
 mysql.dbdump          off collect a database dump
 mysql.all_logs        off collect all MySQL logs
 networking.traceroute off collects a traceroute to rhn.redhat.com
 openshift.broker      off Gathers broker specific files
 openshift.node        off Gathers node specific files
 openshift.gear        off Collect information about a specific gear
 pgsql.pghome          /var/lib/pgsql PostgreSQL server home directory (default=/var/lib/pgsql)
 pgsql.username        off username for pg_dump (default=postgres)
 pgsql.password        off password for pg_dump (password visible in process listings)
 pgsql.dbname          off database name to dump for pg_dump (default=None)
 pgsql.dbhost          off hostname/IP of the server upon which the DB is running (default=localhost)
 pgsql.dbport          off database server port number (default=5432)
 postgresql.pghome     /var/lib/pgsql PostgreSQL server home directory.
 postgresql.username   postgres username for pg_dump
 postgresql.password   off password for pg_dump (password visible in process listings)
 postgresql.dbname           database name to dump for pg_dump
 postgresql.dbhost           database hostname/IP (do not use unix socket)
 postgresql.dbport     5432  database server port number
 printing.logsize      5 max size (MiB) to collect per log file
 printing.all_logs     off collect all cups log files
 psacct.all            off collect all process accounting files
 rpm.rpmq              on queries for package information via rpm -q
 rpm.rpmva             off runs a verify on all packages
 selinux.fixfiles      off Print incorrect file context labels
 selinux.list          off List objects and their context
 startup.servicestatus off get a status of all running services
 yum.yumlist           off list repositories and packages
 yum.yumdebug          off gather yum debugging data

engine-log-collector 命令执行流程

解析参数

 conf = Configuration(parser)
        if not conf.get('pg_pass') and pg_pass:
            conf['pg_pass'] = pg_pass
        collector = LogCollector(conf)

if os.path.exists(conf["local_tmp_dir"]):
            if not os.path.isdir(conf["local_tmp_dir"]):
                raise Exception(
                    '%s is not a directory.' % (conf["local_tmp_dir"])
                )
        else:
            logging.info(
                "%s does not exist.  It will be created." % (
                    conf["local_tmp_dir"]
                )
            )
            os.makedirs(conf["local_tmp_dir"])

conf["local_scratch_dir"] = os.path.join(
            conf["local_tmp_dir"],
            'log-collector-data'
        )


if not os.path.exists(conf["local_scratch_dir"]):
            os.makedirs(conf["local_scratch_dir"])

根据命令类型的不同执行不同的方法

list 获取主机列表
- 调用 restApi 获取，在 /usr/lib/python2.6/site-packages/ovirt_log_collector/helper/hypervisors.py 中初始化 API 访问。

def _initialize_api(hostname, username, password, ca, insecure):
    """
    Initialize the oVirt RESTful API
    """
    url = 'https://{hostname}/ovirt-engine/api'.format(
        hostname=hostname,
    )
    api = API(url=url,
              username=username,
              password=password,
              ca_file=ca,
              validate_cert_chain=not insecure)
    pi = api.get_product_info()
    if pi is not None:
        vrm = '%s.%s.%s' % (
            pi.get_version().get_major(),
            pi.get_version().get_minor(),
            pi.get_version().get_revision()
        )
        logging.debug("API Vendor(%s)\tAPI Version(%s)" % (
            pi.get_vendor(), vrm)
        )
    else:
        api.test(throw_exception=True)
    return api

[root@localhost helper]# engine-log-collector list
This command will collect system configuration and diagnostic
information from this system.
The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before
being passed to any third party.
No changes will be made to system configuration.
Please provide the REST API password for the admin@internal oVirt Engine user (CTRL+D to skip): 
Host list (datacenter=None, cluster=None, host=None):
Data Center          | Cluster              | Hostname/IP Address
Default              | Default              | 192.168.103.117

collect 诊断报告信息收集
- 收集 engine 诊断报告信息

def get_engine_data(self):
        logging.info("Gathering oVirt Engine information...")
        collector = ENGINEData(
            "localhost",
            configuration=self.conf
        )
        collector.sosreport()

查看 sosreport 诊断工具是否包含 ovirt 插件，执行 dwh 前置处理。

def __init__(self, hostname, configuration=None, **kwargs):
        super(ENGINEData, self).__init__(hostname, configuration)
        self._plugins = self.caller.call('sosreport --list-plugins')
        if 'ovirt.sensitive_keys' in self._plugins:
            self._engine_plugin = 'ovirt'
        elif 'ovirt-engine.sensitive_keys' in self._plugins:
            self._engine_plugin = 'ovirt-engine'
        elif 'engine.sensitive_keys' in self._plugins:
            self._engine_plugin = 'engine'
        else:
            logging.error('ovirt plugin not found, falling back on default')
            self._engine_plugin = 'ovirt'
        self.dwh_prep()

[root@localhost ~]# /usr/sbin/sosreport --list-plugins | grep ovirt
 ovirt             oVirt related information

收集 PostgreSQL 诊断报告信息
- 如果 no_postgresql 选项未设置，则进行收集。
- 从配置文件中读取数据库用户名、密码、IP、端口号等。
- 查看诊断工具中是否包含 postgresql 插件。

def get_postgres_data(self):
        if self.conf.get("no_postgresql") is False:
            try:
                try:
                    if not self.conf.get("pg_pass"):
                        self.conf.getpass(
                            "pg_pass",
                            msg="password for the PostgreSQL user, %s, \
to dump the %s PostgreSQL database instance" %
                                (
                                    self.conf.get('pg_user'),
                                    self.conf.get('pg_dbname')
                                )
                        )
                    logging.info(
                        "Gathering PostgreSQL the oVirt Engine database and \
log files from %s..." % (self.conf.get("pg_dbhost"))
                    )
                except Configuration.SkipException:
                    logging.info(
                        "PostgreSQL oVirt Engine database \
will not be collected."
                    )
                    logging.info(
                        "Gathering PostgreSQL log files from %s..." % (
                            self.conf.get("pg_dbhost")
                        )
                    )

                collector = PostgresData(self.conf.get("pg_dbhost"),
                                         configuration=self.conf)
                collector.sosreport()
            except Exception, e:
                ExitCodes.exit_code = ExitCodes.WARN
                logging.error(
                    "Could not collect PostgreSQL information: %s" % e
                )
        else:
            ExitCodes.exit_code = ExitCodes.NOERR
            logging.info("Skipping postgresql collection...")

def __init__(self, hostname, configuration=None, **kwargs):
        super(PostgresData, self).__init__(hostname, configuration)
        self._postgres_plugin = 'postgresql'

[root@localhost ~]# sosreport -l | grep postgresql
 postgresql      PostgreSQL related information
 postgresql.pghome     /var/lib/pgsql PostgreSQL server home directory.
 postgresql.username   postgres username for pg_dump
 postgresql.password   off password for pg_dump (password visible in process listings)
 postgresql.dbname           database name to dump for pg_dump
 postgresql.dbhost           database hostname/IP (do not use unix socket)
 postgresql.dbport     5432  database server port number

收集主机诊断报告信息
- 如果 no_hypervisor 选项未设置，则进行收集。
- 收集的主机列表范围，由 engine-log-collector 命令的 engine 配置组参数决定。
- 采用并行收集，默认为 10。

def get_hypervisor_data(self):
        hosts = self.conf.get("hosts")

        if hosts:
            if not self.conf.get("quiet"):
                # Check if there are more than MAX_WARN_HOSTS_COUNT hosts
                # to collect from
                        if len(hosts) >= MAX_WARN_HOSTS_COUNT:
                            logging.warning(
                                _("{number} hypervisors detected. It might take some "
                                  "time to collect logs from {number} hypervisors. "
                                  "You can use the following filters -c, -d, -H. "
                                  "For more information use -h".format(
                                      number=len(hosts),
                                  ))
                            )
                            _continue = \
                                get_from_prompt(msg="Do you want to proceed(Y/n)",
                                                default='y')
                            if _continue not in ('Y', 'y'):
                                logging.info(
                                    _("Aborting hypervisor collection...")
                                )
                                return
                        else:
                            continue_ = get_from_prompt(
                                msg="About to collect information from "
                                    "{len} hypervisors. Continue? (Y/n): ".format(
                                        len=len(hosts),
                                    ),
                                default='y'
                            )

                            if continue_ not in ('y', 'Y'):
                                logging.info("Aborting hypervisor collection...")
                                return

                    logging.info("Gathering information from selected hypervisors...")

                    max_connections = self.conf.get("max_connections", 10)

                    import threading
                    from collections import deque

                    # max_connections may be defined as a string via a .rc file
                    sem = threading.Semaphore(int(max_connections))
                    time_diff_queue = deque()

                    threads = []

                    for datacenter, cluster, host in hosts:
sem.acquire(True)
                        collector = HyperVisorData(
                            host.strip(),
                            configuration=self.conf,
                            semaphore=sem,
                            queue=time_diff_queue,
                            gluster_enabled=cluster.gluster_enabled
                        )
                        thread = threading.Thread(target=collector.run)
                        thread.start()
                        threads.append(thread)

                    for thread in threads:
                        thread.join()

                    self.write_time_diff(time_diff_queue)

将收集的诊断报告信息进行汇总压缩。


    def archive(self):
        """
        Create a single tarball with collected data from engine, postgresql
        and all hypervisors.
        """
        print _('Creating compressed archive...')
        report_file_ext = 'bz2'
        compressor = 'bzip2'
        caller = Caller({})
        try:
            caller.call('xz --version')
            report_file_ext = 'xz'
            compressor = 'xz'
        except Exception:
            logging.debug('xz compression not available')

        if not os.path.exists(self.conf["output"]):
            os.makedirs(self.conf["output"])

        self.conf["path"] = os.path.join(
            self.conf["output"],
            "sosreport-%s-%s.tar.%s" % (
                'LogCollector',
                time.strftime("%Y%m%d%H%M%S"),
                report_file_ext
            )
        )

        if self.conf["ticket_number"]:
            self.conf["path"] = os.path.join(
                self.conf["output"],
                "sosreport-%s-%s-%s.tar.%s" % (
                    'LogCollector',
                    self.conf["ticket_number"],
                    time.strftime("%Y%m%d%H%M%S"),
                    report_file_ext
                )
            )

        config = {
            'report': os.path.splitext(self.conf['path'])[0],
            'compressed_report': self.conf['path'],
            'compressor': compressor,
            'directory': self.conf["local_tmp_dir"],
        }
        caller.configuration = config
        caller.call("tar -cf '%(report)s' -C '%(directory)s' .")
        shutil.rmtree(self.conf["local_tmp_dir"])
        caller.call("%(compressor)s -1 '%(report)s'")
        os.chmod(self.conf["path"], stat.S_IRUSR | stat.S_IWUSR)
md5_out = caller.call("md5sum '%(compressed_report)s'")
        checksum = md5_out.split()[0]
        with open("%s.md5" % self.conf["path"], 'w') as checksum_file:
            checksum_file.write(md5_out)

        msg = ''
        if os.path.exists(self.conf["path"]):
            archiveSize = float(os.path.getsize(self.conf["path"])) / (1 << 20)

            size = '%.1fM' % archiveSize

            msg = _(
                'Log files have been collected and placed in {path}.\n'
                'The MD5 for this file is {checksum} and its size is {size}'
            ).format(
                path=self.conf["path"],
                size=size,
                checksum=checksum,
            )

            if archiveSize >= 1000:
                msg += _(
                    '\nYou can use the following filters in the next '
                    'execution -c, -d, -H to reduce the archive size.'
                )
        return msg