k8s亲和性部署踩”坑“

2019-08-02 本文已影响0人 wu_sphinx

今天在查看k8s集群服务器的时候，发现一个不太正常的现象，每台机器资源使用看起来不够平衡
先top一下节点

➜  ~ kubectl  top no

NAME          CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
10.2.3.199   190m         2%     3209Mi          20%
10.2.3.200   109m         1%     2083Mi          13%
10.2.3.201   106m         1%     2126Mi          13%
10.2.3.203   338m         4%     9644Mi          61%
10.2.3.204   333m         4%     9805Mi          62%
10.2.3.205   306m         3%     3475Mi          10%
10.2.3.206   1222m        15%    10155Mi         64%
10.2.3.207   285m         3%     9395Mi          59%
10.2.3.208   903m         11%    10317Mi         65%
10.2.3.209   533m         6%     10683Mi         33%
10.2.3.210   1251m        15%    8091Mi          25%
10.2.3.211   84m          4%     1995Mi          54%
10.2.3.212   102m         5%     2013Mi          54%
10.2.3.213   115m         5%     1737Mi          47%
10.2.3.214   97m          4%     1973Mi          53%
10.2.3.215   83m          4%     1891Mi          51%
10.2.3.216   91m          4%     1932Mi          52%

看一下角色分布

➜  ~ kubectl get no
NAME          STATUS   ROLES          AGE    VERSION
10.2.3.199   Ready    worker         58m    v1.14.3
10.2.3.200   Ready    worker         52m    v1.14.3
10.2.3.201   Ready    worker         48m    v1.14.3
10.2.3.203   Ready    worker         2d6h   v1.14.3
10.2.3.204   Ready    worker         2d6h   v1.14.3
10.2.3.205   Ready    worker         2d6h   v1.14.3
10.2.3.206   Ready    worker         2d6h   v1.14.3
10.2.3.207   Ready    worker         2d6h   v1.14.3
10.2.3.208   Ready    worker         2d6h   v1.14.3
10.2.3.209   Ready    worker         2d6h   v1.14.3
10.2.3.210   Ready    worker         2d8h   v1.14.3
10.2.3.211   Ready    controlplane   2d7h   v1.14.3
10.2.3.212   Ready    controlplane   2d8h   v1.14.3
10.2.3.213   Ready    controlplane   2d8h   v1.14.3
10.2.3.214   Ready    etcd           2d8h   v1.14.3
10.2.3.215   Ready    etcd           2d8h   v1.14.3
10.2.3.216   Ready    etcd           2d8h   v1.14.3

可以发现10.2.3.205的内存真是很闲啊，只用了那么一点，即使它的内存有32G之多，这不是偷懒吗，占着茅坑那啥?
这怎么行？先把32G内存的机器打上标签

➜  ~ kubectl label no 10.2.3.205 mem=32
➜  ~ kubectl label no 10.2.3.209 mem=32
➜  ~ kubectl label no 10.2.3.210 mem=32

查看一下, 确认label已经打上了

➜  ~ kubectl get no -l mem
NAME          STATUS   ROLES    AGE    VERSION
10.2.3.205   Ready    worker   2d6h   v1.14.3
10.2.3.209   Ready    worker   2d6h   v1.14.3
10.2.3.210   Ready    worker   2d8h   v1.14.3

接着使用亲和性策略使pod优先部署到label为mem=32的节点上

➜  ~ kubectl run busybox --image busybox --restart Never --dry-run -oyaml > busybox.yaml

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: busybox
  name: busybox
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 50
        preference:
          matchExpressions:
          - key: mem
            operator: In
            values:
            - 32
  containers:
  - image: busybox
    name: busybox
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Never
status: {}

部署一下

➜  ~ kubectl  apply -f busybox.yaml
Error from server (BadRequest): error when creating "busybox.yaml": Pod in version "v1" cannot be handled as a Pod: v1.Pod.Spec: v1.PodSpec.Affinity: v1.Affinity.NodeAffinity: v1.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution: []v1.PreferredSchedulingTerm: v1.PreferredSchedulingTerm.Preference: v1.NodeSelectorTerm.MatchExpressions: []v1.NodeSelectorRequirement: v1.NodeSelectorRequirement.Values: []string: ReadString: expects " or n, but found 3, error found in #10 byte of ...|values":[32]}]},"wei|..., bigger context ...|essions":[{"key":"mem","operator":"In","values":[32]}]},"weight":50}]}},"containers":[{"image":"busy|...

报错了？这个报错并不是特别明显，搜索了下社区，发现还真有人遇到这个问题

The label values must be strings. In yaml, that means all numeric values must be quoted.

原来label必须为字符串类型，如果是纯数字（恰好我设定的label值为32），必须用引号，又一次败在了细节上
修正以后再次部署

➜  ~ kubectl apply -f busybox.yaml
pod/busybox created
➜  ~ kubectl get po busybox -o wide
NAME      READY   STATUS      RESTARTS   AGE   IP           NODE          NOMINATED NODE   READINESS GATES
busybox   0/1     Completed   0          10s   10.2.67.84   10.2.3.210   <none>           <none>

结果显示确实是部署到大内存机器上了

再次复习一下字段含义

➜  ~ kubectl explain pod.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution
KIND:     Pod
VERSION:  v1

RESOURCE: preferredDuringSchedulingIgnoredDuringExecution <[]Object>

DESCRIPTION:
     The scheduler will prefer to schedule pods to nodes that satisfy the
     affinity expressions specified by this field, but it may choose a node that
     violates one or more of the expressions. The node that is most preferred is
     the one with the greatest sum of weights, i.e. for each node that meets all
     of the scheduling requirements (resource request, requiredDuringScheduling
     affinity expressions, etc.), compute a sum by iterating through the
     elements of this field and adding "weight" to the sum if the node matches
     the corresponding matchExpressions; the node(s) with the highest sum are
     the most preferred.

     An empty preferred scheduling term matches all objects with implicit weight
     0 (i.e. it's a no-op). A null preferred scheduling term matches no objects
     (i.e. is also a no-op).

FIELDS:
   preference   <Object> -required-
     A node selector term, associated with the corresponding weight.

   weight   <integer> -required-
     Weight associated with matching the corresponding nodeSelectorTerm, in the
     range 1-100.

weight数值越大代表preference的优先级越高，在这之前，一直想当然地认为k8s的调度是以资源富余程度来做筛选以及
给节点打分，事实证明并非如此，需要认真啃一下k8s的调度器策略了。

参考

k8s亲和性部署踩”坑“

猜你喜欢

热点阅读