kubernetes scheduler抢占调度和优先级队列

2018-08-01  本文已影响0人  午时已呃啊

本文重点分析Kubernetes 1.10版本抢占式调度和优先级队列。在Kubernetes 1.8 以前,scheduler中是没有抢占式调度的,调度流程也很简单,从FIFO queue中Pop出一个pod,进行调度,失败了就重新加入queue(为了减轻scheduler的压力,有back off的机制),等待下次调度。

这样就导致一下几个问题:

调度流程

调度的主要流程在kubernetes\cmd\kube-scheduler\scheduler.go中

pod := sched.config.NextPod()

抢占调度关键流程

在上面的调度流程中,如果调度失败,进入抢占流程

优先级队列

在1.10版本中,初始化podQueue的时候会根据配置选择创建FIFO queue还是PriorityQueue

func NewSchedulingQueue() SchedulingQueue {
    if util.PodPriorityEnabled() {
        return NewPriorityQueue()
    }
    return NewFIFO()
}

我们这里只介绍PriorityQueue,下面是PriorityQueue的数据结构

type PriorityQueue struct {
    lock sync.RWMutex
    cond sync.Cond

    // activeQ is heap structure that scheduler actively looks at to find pods to
    // schedule. Head of heap is the highest priority pod.
    activeQ *Heap
    // unschedulableQ holds pods that have been tried and determined unschedulable.
    unschedulableQ *UnschedulablePodsMap
    // nominatedPods is a map keyed by a node name and the value is a list of
    // pods which are nominated to run on the node. These are pods which can be in
    // the activeQ or unschedulableQ.
    nominatedPods map[string][]*v1.Pod
    // receivedMoveRequest is set to true whenever we receive a request to move a
    // pod from the unschedulableQ to the activeQ, and is set to false, when we pop
    // a pod from the activeQ. It indicates if we received a move request when a
    // pod was in flight (we were trying to schedule it). In such a case, we put
    // the pod back into the activeQ if it is determined unschedulable.
    receivedMoveRequest bool
}

type UnschedulablePodsMap struct {
    // pods is a map key by a pod's full-name and the value is a pointer to the pod.
    pods    map[string]*v1.Pod
    keyFunc func(*v1.Pod) string
}

activeQ

activeQ是一个有序的Heap结构,Pop时会弹出最高优先级的Pending Pod。

unscheduableQ

主要是一个Map,key为pod.Name + "_" + pod.Namespace,value为那些已经尝试调度并且调度失败的UnSchedulable的Pod。

nominatedPods

nominatedPods表示已经被该node提名的,期望调度在该node上的,但是又还没最终成功调度过来的Pods。

nominatedPods的作用是防止高优先级的Pods进行抢占调度时删除了低优先级Pods--->再次调度,在这段时间内,抢占的资源又被低优先级的Pods占用了。

receivedMoveRequest

    if !p.receivedMoveRequest && isPodUnschedulable(pod) {
        p.unschedulableQ.addOrUpdate(pod)
        p.addNominatedPodIfNeeded(pod)
        return nil
    }
    err := p.activeQ.Add(pod)
    if err == nil {
        p.addNominatedPodIfNeeded(pod)
        p.cond.Broadcast()
    }

其实目的就是当pod从unschedulableQ -> activeQ时,意味着集群可能有资源了,然后就直接加入activeQ,不用先进unSchedulableQ 了,算是一个优化

各类event涉及的queue操作

scheduled pod cache

unscheduled pod queue

node event

service event

pvc event

上一篇 下一篇

猜你喜欢

热点阅读