volcano基于guarantee和capability的调度
一、问题背景
在使用kubernenetes云原生系统的过程中,随着业务场景的不断复杂,kubernetes默认的调度器便难以支撑,因此很多公司都会选择一些优秀的开源项目来替换默认的调度器,volcano就是其中之一。volcano有诸多有点,比如gang、backfill等,在此就不一一展开。但在使用中也存在一些不便,比如下面这种场景:
集群一共20C资源,3个团队,要求每个团队最少要保证5C资源,最多不能超过10C,这个时候如果将volcano的queue的guarantee设置成5C,capability设置成10C的话,就会出现超配。
二、问题解析
我们再剖析一下造成“一” 中问题的原因,volcano在设计Queue的过程中,为了支持各种场景下的调度策略,给Queue加了几个属性:weight、guarantee和capability。weight表示集群剩余资源的分配权重,是可动态多次分配的;guarantee表示预占资源,是一个最小资源保障;capability表示最大资源,是一个最大资源限制;deserved表示当前session轮次queue分配到的资源。当某个queue的guarantee超过weight所占集群的资源的时候,会出现既要满足各个queue的weight,又要满足某个queue的guarantee,最终的deserved总和就会超过集群总资源的问题。
官方代码:
`
for {
totalWeight := int32(0)
for _, attr := range pp.queueOpts {
if _, found := meet[attr.queueID]; found {
continue
}
totalWeight += attr.weight
}
// If no queues, break
if totalWeight == 0 {
klog.V(4).Infof("Exiting when total weight is 0")
break
}
oldRemaining := remaining.Clone()
// Calculates the deserved of each Queue.
// increasedDeserved is the increased value for attr.deserved of processed queues
// decreasedDeserved is the decreased value for attr.deserved of processed queues
increasedDeserved := api.EmptyResource()
decreasedDeserved := api.EmptyResource()
for _, attr := range pp.queueOpts {
klog.V(4).Infof("Considering Queue <%s>: weight <%d>, total weight <%d>.",
attr.name, attr.weight, totalWeight)
if _, found := meet[attr.queueID]; found {
continue
}
oldDeserved := attr.deserved.Clone()
attr.deserved.Add(remaining.Clone().Multi(float64(attr.weight) / float64(totalWeight)))
if attr.realCapability != nil {
attr.deserved.MinDimensionResource(attr.realCapability, api.Infinity)
}
attr.deserved.MinDimensionResource(attr.request, api.Zero)
klog.V(4).Infof("Format queue <%s> deserved resource to <%v>", attr.name, attr.deserved)
if attr.request.LessEqual(attr.deserved, api.Zero) {
meet[attr.queueID] = struct{}{}
klog.V(4).Infof("queue <%s> is meet", attr.name)
} else if reflect.DeepEqual(attr.deserved, oldDeserved) {
meet[attr.queueID] = struct{}{}
klog.V(4).Infof("queue <%s> is meet cause of the capability", attr.name)
}
attr.deserved = helpers.Max(attr.deserved, attr.guarantee)
pp.updateShare(attr)
klog.V(4).Infof("The attributes of queue <%s> in proportion: deserved <%v>, realCapability <%v>, allocate <%v>, request <%v>, share <%0.2f>",
attr.name, attr.deserved, attr.realCapability, attr.allocated, attr.request, attr.share)
increased, decreased := attr.deserved.Diff(oldDeserved, api.Zero)
increasedDeserved.Add(increased)
decreasedDeserved.Add(decreased)
// Record metrics
metrics.UpdateQueueDeserved(attr.name, attr.deserved.MilliCPU, attr.deserved.Memory)
}
remaining.Sub(increasedDeserved).Add(decreasedDeserved)
klog.V(4).Infof("Remaining resource is <%s>", remaining)
if remaining.IsEmpty() || reflect.DeepEqual(remaining, oldRemaining) {
klog.V(4).Infof("Exiting when remaining is empty or no queue has more reosurce request: <%v>", remaining)
break
}
}
`
三、我的场景及设计
场景:假设集群一共有20C资源,现在有3个团队,要求每个团队最少要保证使用5C资源,最多不能超过10C资源。
设计:在此场景下,其实是只需要设置queue的guarantee为5C,capability为10C,而不需要考虑weight属性。所以在此场景下,我们重新设计了方案,步骤如下:
1、优先分配各个queue的guarantee资源,前提是guarantee总和小于集群总资源,否则可以直接panic出来;
2、