事件驱动HPA利器-KEDA

2022-07-17 本文已影响0人 Xiao_Yang

今天技术实践分享的是开源工具KEDA,KEDA是一种基于事件驱动K8s资源对象扩缩容的利器，非常轻量、简单、功能强大，不仅支持早期为大家分享的基于CPU/MEM资源和基于Cron定时HPA方式，同时也支持各种事件驱动型HPA，比如MQ 、Kafka等消息队列长度事件，Redis 、URL Metric 、Promtheus 数值阀值事件等等事件源(Scalers)，目前KEDA版本(v2.7)内置支持53种事件源Scalers。今天本文分享的是作为运维人员最常用的一个神器scaler " Promtheus " 事件源来驱动来执行扩缩Job实例,同样实现以下需求场景为目标：

单个业务应用程序执行一个请求(长处理任务)
可按需启动多个业务应用程序
业务应用程序处理后自动退出

一、KEDA安装篇

1.1 安装(Yaml方式)

wget https://github.com/kedacore/keda/releases/download/v2.7.1/keda-2.7.1.yaml

# 注意国内网络访问github镜像问题，需要进行替为国内的代理镜像地址
ghcr.io 替换为 ghcr.nju.edu.cn 

# 安装过程显示
[keda /]# kubectl apply  -f keda-2.7.1.yaml
namespace/keda created
customresourcedefinition.apiextensions.k8s.io/clustertriggerauthentications.keda.sh created
customresourcedefinition.apiextensions.k8s.io/scaledjobs.keda.sh created
customresourcedefinition.apiextensions.k8s.io/scaledobjects.keda.sh created
customresourcedefinition.apiextensions.k8s.io/triggerauthentications.keda.sh created
serviceaccount/keda-operator created
clusterrole.rbac.authorization.k8s.io/keda-external-metrics-reader created
clusterrole.rbac.authorization.k8s.io/keda-operator created
rolebinding.rbac.authorization.k8s.io/keda-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/keda-hpa-controller-external-metrics created
clusterrolebinding.rbac.authorization.k8s.io/keda-operator created
clusterrolebinding.rbac.authorization.k8s.io/keda-system-auth-delegator created
service/keda-metrics-apiserver created
deployment.apps/keda-metrics-apiserver created
deployment.apps/keda-operator created
apiservice.apiregistration.k8s.io/v1beta1.external.metrics.k8s.io configured

1.2 安装检测

查看已安装资源对象

[keda /]# kubectl get all -n keda
NAME                                        READY   STATUS              RESTARTS   AGE
pod/keda-metrics-apiserver-5ff7b56d-lgwrs   0/1     ContainerCreating   0          26s
pod/keda-operator-65df59d669-r5qct          0/1     ContainerCreating   0          26s

NAME                             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
service/keda-metrics-apiserver   ClusterIP   10.109.61.104   <none>        443/TCP,80/TCP   26s

NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/keda-metrics-apiserver   0/1     1            0           26s
deployment.apps/keda-operator            0/1     1            0           26s

NAME                                              DESIRED   CURRENT   READY   AGE
replicaset.apps/keda-metrics-apiserver-5ff7b56d   1         1         0       26s
replicaset.apps/keda-operator-65df59d669          1         1         0       26s

[keda /]# kubectl api-resources | grep scale
scaledjobs           sj       keda.sh/v1alpha1         true         ScaledJob
scaledobjects        so       keda.sh/v1alpha1         true         ScaledObject

二、ScaledJob 篇

2.1 Prometheus 查询结果 Trigger(事件触发器)

在k8s集群内创建与应用ScaledJob对象

# scaledjob-demo.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: scaled-job-prometheus
spec:
  jobTargetRef:
    parallelism: 2                                   # 单个任务的并写Pod数
#    completions: 4            
#    activeDeadlineSeconds: 600        # job运行最大时间Deadline
    backoffLimit: 6
    template:
      spec:
        restartPolicy: Never
        containers:
        - image: alpine
          name: demo-job-scale
          command: ["/bin/sh"]
          args:
          - -c
          - echo "job doing..."; sleep 120
  pollingInterval: 30
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 5
  maxReplicaCount: 5
  rolloutStrategy: gradual
  scalingStrategy:
    strategy: "custom"
    customScalingQueueLengthDeduction: 1
    customScalingRunningJobPercentage: "0.5"
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://x.x.x.x:xxxx             #prometheus服务器地址
      metricName: prometheus_http_requests_total
      query: round(sum(rate(prometheus_http_requests_total{container="prometheus",handler="/api/v1/query",code="200"}[2m]))*100)    
      threshold: '60'

# kubectl apply -f scaledjob-demo.yaml

2.2 ScaledJob 配置参数说明

jobTargetRef

K8s 原生 Job 对象定义。三种运行任务的形式：非并行Job、具有确定完成计数的并行Job、带工作队列的并行Job。详情参数 Jobs

pollingInterval: 30 # Optional. Default: 30 seconds

每个解发器的检测间隔设定，默认每30秒检测一次触发器。

successfulJobsHistoryLimit: 5 # Optional. Default: 100

failedJobsHistoryLimit: 5 # Optional. Default: 100.

设定保留成功完成和失败任务记录历史条目。

envSourceContainerName: {container-name}

指定获取环境变量属性值容器名，如果未指定默认为JobTargetRef指定的第一个容器 .spec.JobTargetRef.template.spec.containers[0]

maxReplicaCount: 5 # Optional. Default: 100

可扩容最大 Job 数。需基于Target Average Value 单个Job所消费数量和Running Job Count当前运行Job数

量来决定所需扩容的Pod数。

注：实验测试结论 maxReplicaCount 为 Job 数量, 且为在单个检测周期内可扩容的最大job数

rolloutStrategy: gradual # Optional. Default: default | gradual

在更新一个存在的ScaledJob配置时执行rollout策略，"default"将终止存在的Jobs任务，并重新创建这些任务；"gradual"则仅创建新的Jobs，对存在的Jobs不做处理；

scalingStrategy: strategy: "default" # default ｜ custom | accurate

三种Scale策略计算说明：

default 策略： maxScale - RunningJobCount

custom 策略： min(maxScale-int64(s.CustomScalingQueueLengthDeduction)-int64(float64(runningJobCount)(*s.CustomScalingRunningJobPercentage)), maxReplicaCount)

customScalingQueueLengthDeduction: 1

customScalingRunningJobPercentage: "0.5"

accurate 策略： if (maxScale + runningJobCount) > maxReplicaCount { return maxReplicaCount - runningJobCount } return maxScale - pendingJobCount

注：实验测试结论 "当default策略时最小的运行Job数为1，而custom策略则可以为0"

策略内名词意义说明

"maxScale" 队列长度与目标消费值之比，与maxReplicaCount设定值最最小值。公式: maxValue = min(scaledJob.MaxReplicaCount(), divideWithCeil(queueLength, targetAverageValue))

"RunningJobCount" 运行中且未完成Job数

"PendingJobCount" Pending状态的Job数

scalingStrategy: multipleScalersCalculation : "max" | min | avg | sum

如果存在有多个触发器的选择行为 Max(default) / Min / Avg / Sum

2.3 测试验证

hey -z 60 -c 10 http://x.x.x.x:xxxx/api/v1/query?query=prometheus_http_requests_total{container="prometheus"}

注：压力测试所设定的prometheus查询语句对象和当达到阀值60后将创建与运行K8s Job对象创建POD执行任务；

# 查看scaledjob应用资源对象
[keda /]# kubectl get sj
NAME                    MAX   TRIGGERS     AUTHENTICATION   READY   ACTIVE   AGE
scaled-job-prometheus   5     prometheus                    True    True     14s

# 压测达到阀值后Job创建的POD资源状态
[keda /]# kubectl get pod
NAME                                      READY   STATUS              RESTARTS   AGE
scaled-job-prometheus-29798-7xn8g         0/1     ContainerCreating   0          17s
scaled-job-prometheus-29798-ghf2z         1/1     Running             0          17s
scaled-job-prometheus-2zcvv-44xkg         0/1     ContainerCreating   0          77s
scaled-job-prometheus-2zcvv-6fvqz         1/1     Running             0          77s
scaled-job-prometheus-6dl26-jkl8m         1/1     Running             0          77s
scaled-job-prometheus-6dl26-vmrd8         1/1     Running             0          77s
scaled-job-prometheus-9w2qf-rtlqg         1/1     Running             0          77s
scaled-job-prometheus-9w2qf-zrc4p         1/1     Running             0          77s
scaled-job-prometheus-dwrp4-h7qvn         1/1     Running             0          77s
scaled-job-prometheus-dwrp4-rbzn7         1/1     Running             0          77s
scaled-job-prometheus-ng6nj-6pmvk         1/1     Running             0          47s
scaled-job-prometheus-ng6nj-gkp8d         1/1     Running             0          47s
scaled-job-prometheus-qb7bk-p9699         0/1     ContainerCreating   0          47s
scaled-job-prometheus-qb7bk-zmlb7         1/1     Running             0          47s

# 压测达到阀值后Job创建的POD资源对象执行完成状态
[keda /]# kubectl get pod
NAME                                      READY   STATUS      RESTARTS   AGE
scaled-job-prometheus-29798-7xn8g         0/1     Completed   0          3m8s
scaled-job-prometheus-29798-ghf2z         0/1     Completed   0          3m8s
scaled-job-prometheus-2zcvv-44xkg         0/1     Completed   0          4m8s
scaled-job-prometheus-2zcvv-6fvqz         0/1     Completed   0          4m8s
scaled-job-prometheus-6dl26-jkl8m         0/1     Completed   0          4m8s
scaled-job-prometheus-6dl26-vmrd8         0/1     Completed   0          4m8s
scaled-job-prometheus-lf27p-m7v2x         0/1     Completed   0          2m38s
scaled-job-prometheus-lf27p-r9lpb         0/1     Completed   0          2m38s
scaled-job-prometheus-qb7bk-p9699         0/1     Completed   0          3m38s
scaled-job-prometheus-qb7bk-zmlb7         0/1     Completed   0          3m38s

# 压测时对K8s Job的状态持续观察
[keda /]# kubectl get job -w
scaled-job-prometheus-9w2qf   0/1 of 2                 0s     #pod创建
scaled-job-prometheus-2zcvv   0/1 of 2                 0s
scaled-job-prometheus-9w2qf   0/1 of 2                 0s
scaled-job-prometheus-dwrp4   0/1 of 2                 0s
scaled-job-prometheus-2zcvv   0/1 of 2                 0s
scaled-job-prometheus-6dl26   0/1 of 2                 0s
scaled-job-prometheus-dwrp4   0/1 of 2                 0s
scaled-job-prometheus-6dl26   0/1 of 2                 0s
scaled-job-prometheus-9w2qf   0/1 of 2      0s         0s
scaled-job-prometheus-dwrp4   0/1 of 2      0s         0s
scaled-job-prometheus-6dl26   0/1 of 2      0s         0s
scaled-job-prometheus-2zcvv   0/1 of 2      0s         0s
scaled-job-prometheus-ng6nj   0/1 of 2                 0s
scaled-job-prometheus-ng6nj   0/1 of 2                 0s
scaled-job-prometheus-qb7bk   0/1 of 2                 0s
scaled-job-prometheus-qb7bk   0/1 of 2                 0s
scaled-job-prometheus-ng6nj   0/1 of 2      0s         0s
scaled-job-prometheus-qb7bk   0/1 of 2      0s         0s
scaled-job-prometheus-29798   0/1 of 2                 0s
scaled-job-prometheus-29798   0/1 of 2                 0s
scaled-job-prometheus-29798   0/1 of 2      0s         0s
scaled-job-prometheus-lf27p   0/1 of 2                 0s
scaled-job-prometheus-lf27p   0/1 of 2                 0s
scaled-job-prometheus-lf27p   0/1 of 2      0s         0s
scaled-job-prometheus-dwrp4   1/1 of 2      2m17s      2m17s # pod1执行完成
scaled-job-prometheus-dwrp4   1/1 of 2      2m17s      2m17s
scaled-job-prometheus-9w2qf   1/1 of 2      2m18s      2m18s
scaled-job-prometheus-9w2qf   1/1 of 2      2m18s      2m18s
scaled-job-prometheus-2zcvv   1/1 of 2      2m18s      2m18s
scaled-job-prometheus-2zcvv   1/1 of 2      2m18s      2m18s
scaled-job-prometheus-dwrp4   2/1 of 2      2m33s      2m33s # pod2执行完成
scaled-job-prometheus-dwrp4   2/1 of 2      2m33s      2m33s
scaled-job-prometheus-6dl26   1/1 of 2      2m34s      2m34s
scaled-job-prometheus-6dl26   1/1 of 2      2m34s      2m34s
scaled-job-prometheus-qb7bk   1/1 of 2      2m17s      2m17s
scaled-job-prometheus-qb7bk   1/1 of 2      2m17s      2m17s
scaled-job-prometheus-9w2qf   2/1 of 2      2m49s      2m49s
scaled-job-prometheus-9w2qf   2/1 of 2      2m49s      2m49s
scaled-job-prometheus-ng6nj   1/1 of 2      2m19s      2m19s
scaled-job-prometheus-ng6nj   1/1 of 2      2m19s      2m19s
scaled-job-prometheus-ng6nj   2/1 of 2      2m32s      2m32s
scaled-job-prometheus-ng6nj   2/1 of 2      2m32s      2m32s
scaled-job-prometheus-6dl26   2/1 of 2      3m4s       3m4s
scaled-job-prometheus-6dl26   2/1 of 2      3m4s       3m4s
scaled-job-prometheus-29798   1/1 of 2      2m17s      2m17s
scaled-job-prometheus-29798   1/1 of 2      2m17s      2m17s
scaled-job-prometheus-2zcvv   2/1 of 2      3m19s      3m19s
scaled-job-prometheus-2zcvv   2/1 of 2      3m19s      3m19s
scaled-job-prometheus-qb7bk   2/1 of 2      2m50s      2m50s
scaled-job-prometheus-qb7bk   2/1 of 2      2m50s      2m50s
scaled-job-prometheus-dwrp4   2/1 of 2      2m33s      3m30s
scaled-job-prometheus-29798   2/1 of 2      2m36s      2m36s
scaled-job-prometheus-29798   2/1 of 2      2m36s      2m36s
scaled-job-prometheus-lf27p   1/1 of 2      2m17s      2m17s
scaled-job-prometheus-lf27p   1/1 of 2      2m17s      2m17s
scaled-job-prometheus-lf27p   2/1 of 2      2m21s      2m21s
scaled-job-prometheus-lf27p   2/1 of 2      2m21s      2m21s
scaled-job-prometheus-9w2qf   2/1 of 2      2m49s      4m
scaled-job-prometheus-ng6nj   2/1 of 2      2m32s      3m30s

～～FINISH ～～