[k8s源码分析][kube-scheduler]schedul
1. 前言
转载请说明原文出处, 尊重他人劳动成果!
本文将分析默认调度器是如何注册和如何被使用的, 主要涉及到了两个文件
pkg/scheduler/factory/plugins.go
和pkg/scheduler/algorithmprovider/defaults/defaults.go
源码位置: https://github.com/nicktming/kubernetes
分支: tming-v1.13 (基于v1.13版本)
2. 注册默认scheduler
相信大家或多或少都看到过类似下面的文件, 可能有所了解或者不了解, 接下来的内容将会对理解这个文件有所帮助.
{
"kind" : "Policy",
"apiVersion" : "v1",
"predicates" : [
{"name" : "PodFitsHostPorts"},
{"name" : "PodFitsResources"},
{"name" : "NoDiskConflict"},
{"name" : "MatchNodeSelector"},
{"name" : "HostName"}
],
"priorities" : [
{"name" : "LeastRequestedPriority", "weight" : 1},
{"name" : "BalancedResourceAllocation", "weight" : 1},
{"name" : "ServiceSpreadingPriority", "weight" : 1},
{"name" : "EqualPriority", "weight" : 1}
],
}
当
kube-scheduler
要调度一个pod
的时候, 现在有一些节点, 到底如何给这个pod
分配节点呢?
总所周知,kube-scheduler
会做预选(predicate)从这些节点选出可以运行这个pod
的节点(比如有些节点因为资源不足或者节点亲和性等等无法运行该pod
), 然后通过优选(priority)从这些预选结果中选出得分最高的那个节点作为最终要运行的节点.
那么预选(predicate)是必须要通过哪些预选方法比如上面的文件中
PodFitsHostPorts
,PodFitsResources
等等.
而优选(priority)是每个方法有一个权重, 该pod
在某节点上的得分就是这些方法的总和.
在介绍注册默认调度器前, 需要先介绍
pkg/scheduler/factory/plugins.go
, 因为该文件就是为注册调度器而准备的.
3. pkg/scheduler/factory/plugins.go
type PluginFactoryArgs struct {
PodLister algorithm.PodLister
ServiceLister algorithm.ServiceLister
ControllerLister algorithm.ControllerLister
ReplicaSetLister algorithm.ReplicaSetLister
StatefulSetLister algorithm.StatefulSetLister
NodeLister algorithm.NodeLister
PDBLister algorithm.PDBLister
NodeInfo predicates.NodeInfo
PVInfo predicates.PersistentVolumeInfo
PVCInfo predicates.PersistentVolumeClaimInfo
StorageClassInfo predicates.StorageClassInfo
VolumeBinder *volumebinder.VolumeBinder
HardPodAffinitySymmetricWeight int32
}
type FitPredicateFactory func(PluginFactoryArgs) algorithm.FitPredicate
type PriorityFunctionFactory func(PluginFactoryArgs) algorithm.PriorityFunction
type PriorityFunctionFactory2 func(PluginFactoryArgs) (algorithm.PriorityMapFunction, algorithm.PriorityReduceFunction)
FitPredicateFactory
: 根据PluginFactoryArgs
返回预选方法
PriorityFunctionFactory
: 根据PluginFactoryArgs
返回优选方法 老版本
PriorityFunctionFactory2
: 根据PluginFactoryArgs
返回优选方法 新版本 返回Map
和Reduce
方法
3.1 基本结构
type PriorityConfigFactory struct {
Function PriorityFunctionFactory
MapReduceFunction PriorityFunctionFactory2
Weight int
}
var (
schedulerFactoryMutex sync.Mutex
// maps that hold registered algorithm types
fitPredicateMap = make(map[string]FitPredicateFactory)
mandatoryFitPredicates = sets.NewString()
priorityFunctionMap = make(map[string]PriorityConfigFactory)
algorithmProviderMap = make(map[string]AlgorithmProviderConfig)
// Registered metadata producers
priorityMetadataProducer PriorityMetadataProducerFactory
predicateMetadataProducer PredicateMetadataProducerFactory
)
const (
// DefaultProvider defines the default algorithm provider name.
DefaultProvider = "DefaultProvider"
)
type AlgorithmProviderConfig struct {
FitPredicateKeys sets.String
PriorityFunctionKeys sets.String
}
可以看到默认调度器的名字为
DefaultProvider
.
fitPredicateMap
: 是一个全局变量, 存着预选名字(predicate)和对应的生成预选方法的FitPredicateFactory.
priorityFunctionMap
: 也是一个全局变量, 存着优选名字(priority)和其对应的生成优选方法的PriorityConfigFactory.
algorithmProviderMap
: 也是一个全局变量, 存着该调度器(比如DefaultProvider)和其拥有的所有预选名字和所有优选名字. (因为AlgorithmProviderConfig包含着预选和优选名字)
mandatoryFitPredicates
: 全局变量, 存着mandatory的预选名字.
3.2 注册预选方法
// pkg/scheduler/factory/plugins.go
func RegisterFitPredicate(name string, predicate algorithm.FitPredicate) string {
return RegisterFitPredicateFactory(name, func(PluginFactoryArgs) algorithm.FitPredicate { return predicate })
}
// 通过正则表达式检查一下预选的名字是否合法
var validName = regexp.MustCompile("^[a-zA-Z0-9]([-a-zA-Z0-9]*[a-zA-Z0-9])$")
func validateAlgorithmNameOrDie(name string) {
if !validName.MatchString(name) {
klog.Fatalf("Algorithm name %v does not match the name validation regexp \"%v\".", name, validName)
}
}
func RegisterFitPredicateFactory(name string, predicateFactory FitPredicateFactory) string {
schedulerFactoryMutex.Lock()
defer schedulerFactoryMutex.Unlock()
validateAlgorithmNameOrDie(name)
fitPredicateMap[name] = predicateFactory
return name
}
很简单, 就是把预选名字和预选方法传进来, 然后注册的FitPredicateFactory生成预选方法的时候就是返回传入进来的预选方法
predicate
. 然后返回name
.
接下来这个是注册自己的
FitPredicateFactory
. 这个就什么都没有动, 就是放到map
里. 然后返回name
. 另外RegisterMandatoryFitPredicate
多做了一步就是把该name
加入到mandatoryFitPredicates
中.
func RegisterFitPredicateFactory(name string, predicateFactory FitPredicateFactory) string {
schedulerFactoryMutex.Lock()
defer schedulerFactoryMutex.Unlock()
validateAlgorithmNameOrDie(name)
fitPredicateMap[name] = predicateFactory
return name
}
func RegisterMandatoryFitPredicate(name string, predicate algorithm.FitPredicate) string {
schedulerFactoryMutex.Lock()
defer schedulerFactoryMutex.Unlock()
validateAlgorithmNameOrDie(name)
fitPredicateMap[name] = func(PluginFactoryArgs) algorithm.FitPredicate { return predicate }
mandatoryFitPredicates.Insert(name)
return name
}
接下来看看
pkg/scheduler/algorithmprovider/defaults/defaults.go
中defaultPredicates
方法如何注册的.
// pkg/scheduler/algorithmprovider/defaults/defaults.go
func defaultPredicates() sets.String {
return sets.NewString(
factory.RegisterFitPredicateFactory(
predicates.NoVolumeZoneConflictPred,
func(args factory.PluginFactoryArgs) algorithm.FitPredicate {
return predicates.NewVolumeZonePredicate(args.PVInfo, args.PVCInfo, args.StorageClassInfo)
},
),
...
factory.RegisterMandatoryFitPredicate(predicates.CheckNodeConditionPred, predicates.CheckNodeConditionPredicate),
factory.RegisterFitPredicate(predicates.PodToleratesNodeTaintsPred, predicates.PodToleratesNodeTaints),
...
)
}
可以看到既调用了
RegisterFitPredicateFactory
,RegisterMandatoryFitPredicate
, 和RegisterFitPredicate
, 这样fitPredicateMap
这个全局变量里面存着所有注册的预选名字以及其对应生成预选方法的predicateFactory
.
其中defaultPredicates()
的返回值就是fitPredicateMap
的所有key
.
3.3 注册优选方法
// pkg/scheduler/factory/plugins.go
func RegisterPriorityConfigFactory(name string, pcf PriorityConfigFactory) string {
schedulerFactoryMutex.Lock()
defer schedulerFactoryMutex.Unlock()
validateAlgorithmNameOrDie(name)
priorityFunctionMap[name] = pcf
return name
}
func RegisterPriorityFunction2(
name string,
mapFunction algorithm.PriorityMapFunction,
reduceFunction algorithm.PriorityReduceFunction,
weight int) string {
return RegisterPriorityConfigFactory(name, PriorityConfigFactory{
MapReduceFunction: func(PluginFactoryArgs) (algorithm.PriorityMapFunction, algorithm.PriorityReduceFunction) {
return mapFunction, reduceFunction
},
Weight: weight,
})
}
可以看到
RegisterPriorityFunction2
是后期版本开发的, 带有map-reduce
方法, 为了兼容前面版本, 所以都是注册的生成优选方法的都是PriorityConfigFactory
. 然后返回name
.
接下来看看
pkg/scheduler/algorithmprovider/defaults/defaults.go
中defaultPredicates
方法如何注册的.
// pkg/scheduler/algorithmprovider/defaults/defaults.go
func defaultPriorities() sets.String {
return sets.NewString(
// spreads pods by minimizing the number of pods (belonging to the same service or replication controller) on the same node.
factory.RegisterPriorityConfigFactory(
"SelectorSpreadPriority",
factory.PriorityConfigFactory{
MapReduceFunction: func(args factory.PluginFactoryArgs) (algorithm.PriorityMapFunction, algorithm.PriorityReduceFunction) {
return priorities.NewSelectorSpreadPriority(args.ServiceLister, args.ControllerLister, args.ReplicaSetLister, args.StatefulSetLister)
},
Weight: 1,
},
),
...
factory.RegisterPriorityFunction2("ImageLocalityPriority", priorities.ImageLocalityPriorityMap, nil, 1),
)
}
其实更预选一样, 然后注册的优选方法都在全局变量
priorityFunctionMap
, 并且defaultPriorities()
返回的就是注册的所有优选方法的名字.
3.4 注册调度器
可以看到注册一个调度器需要传入调度器的名字(
name
) 以及该调度器拥有的预选方法(predicateKeys
) 和 优选方法(priorityKeys
)
// pkg/scheduler/factory/plugins.go
func RegisterAlgorithmProvider(name string, predicateKeys, priorityKeys sets.String) string {
schedulerFactoryMutex.Lock()
defer schedulerFactoryMutex.Unlock()
validateAlgorithmNameOrDie(name)
algorithmProviderMap[name] = AlgorithmProviderConfig{
FitPredicateKeys: predicateKeys,
PriorityFunctionKeys: priorityKeys,
}
return name
}
接下来看看
pkg/scheduler/algorithmprovider/defaults/defaults.go
中registerAlgorithmProvider
方法如何注册的.
// pkg/scheduler/algorithmprovider/defaults/defaults.go
func registerAlgorithmProvider(predSet, priSet sets.String) {
factory.RegisterAlgorithmProvider(factory.DefaultProvider, predSet, priSet)
...
}
可以看到该方法就算是把默认的调度器存到
algorithmProviderMap
这个全局变量中了. 也就是可以通过algorithmProviderMap["DefaultProvider"]
获得默认调度器了.
// pkg/scheduler/factory/plugins.go
func GetAlgorithmProvider(name string) (*AlgorithmProviderConfig, error) {
schedulerFactoryMutex.Lock()
defer schedulerFactoryMutex.Unlock()
provider, ok := algorithmProviderMap[name]
if !ok {
return nil, fmt.Errorf("plugin %q has not been registered", name)
}
return &provider, nil
}
根据调度器名字获得调度器. 所以
GetAlgorithmProvider("DefaultProvider")
就可以获得默认调度器了.
3.5 注册默认调度器
那什么时候会调用
defults
中的registerAlgorithmProvider
方法呢?
可以看到pkg/scheduler/algorithmprovider/defaults/defaults.go
中的init
方法.
// pkg/scheduler/algorithmprovider/defaults/defaults.go
func init() {
...
registerAlgorithmProvider(defaultPredicates(), defaultPriorities())
...
}
也就是引用了
pkg/scheduler/algorithmprovider/defaults/defaults.go
文件的时候就会把默认调度器注册到algorithmProviderMap
全局变量中了.
3.6 使用默认调度器
在
kube-scheduler
启动的时候会进入到pkg/scheduler/scheduler.go
中的New
方法生成Scheduler
实例.
// pkg/scheduler/scheduler.go
// New returns a Scheduler
func New(client clientset.Interface,
nodeInformer coreinformers.NodeInformer,
podInformer coreinformers.PodInformer,
pvInformer coreinformers.PersistentVolumeInformer,
pvcInformer coreinformers.PersistentVolumeClaimInformer,
replicationControllerInformer coreinformers.ReplicationControllerInformer,
replicaSetInformer appsinformers.ReplicaSetInformer,
statefulSetInformer appsinformers.StatefulSetInformer,
serviceInformer coreinformers.ServiceInformer,
pdbInformer policyinformers.PodDisruptionBudgetInformer,
storageClassInformer storageinformers.StorageClassInformer,
recorder record.EventRecorder,
schedulerAlgorithmSource kubeschedulerconfig.SchedulerAlgorithmSource,
stopCh <-chan struct{},
opts ...func(o *schedulerOptions)) (*Scheduler, error) {
...
source := schedulerAlgorithmSource
switch {
case source.Provider != nil:
// Create the config from a named algorithm provider.
sc, err := configurator.CreateFromProvider(*source.Provider)
if err != nil {
return nil, fmt.Errorf("couldn't create scheduler using provider %q: %v", *source.Provider, err)
}
config = sc
case source.Policy != nil:
// Create the config from a user specified policy source.
policy := &schedulerapi.Policy{}
switch {
case source.Policy.File != nil:
if err := initPolicyFromFile(source.Policy.File.Path, policy); err != nil {
return nil, err
}
case source.Policy.ConfigMap != nil:
if err := initPolicyFromConfigMap(client, source.Policy.ConfigMap, policy); err != nil {
return nil, err
}
}
sc, err := configurator.CreateFromConfig(*policy)
if err != nil {
return nil, fmt.Errorf("couldn't create scheduler from policy: %v", err)
}
config = sc
default:
return nil, fmt.Errorf("unsupported algorithm source: %v", source)
}
...
}
1. 在
kube-scheduler
启动命令中如果配置了config
参数也就是说用户自己配置预选和优选方法. (这部分在自定义scheduler部分分析), 会进入到case source.Policy != nil:
部分进行操作.
2. 如果没有配置的话就会进入到case source.Provider != nil:
部分进行, 因为此时的*source.Provider
就是DefaultProvider
. 进而configurator.CreateFromProvider(*source.Provider)
就会进入到pkg/scheduler/factory/factory.go
中进行操作, 因为此时的configurator
是一个configFactory
对象.
// pkg/scheduler/factory/factory.go
func (c *configFactory) CreateFromProvider(providerName string) (*Config, error) {
klog.V(2).Infof("Creating scheduler from algorithm provider '%v'", providerName)
provider, err := GetAlgorithmProvider(providerName)
if err != nil {
return nil, err
}
return c.CreateFromKeys(provider.FitPredicateKeys, provider.PriorityFunctionKeys, []algorithm.SchedulerExtender{})
}
可以看到该方法中调用了
pkg/scheduler/factory/plugins.go
的GetAlgorithmProvider
方法, 所以就获得了默认调度器(DefaultProvider
)的配置(预选方法和优选方法).
4. 总结
本文分析了默认调度器是如何注册和如何被使用的, 主要涉及到了两个文件
pkg/scheduler/factory/plugins.go
和pkg/scheduler/algorithmprovider/defaults/defaults.go
. 对自定义调度器注册预选和优选信息也会有所帮助, 因为自定义调度器肯定也是往上面说的那些全局变量里面写.