一、简述
调度器大体分为Predicate、Priority、Select(预选、优选、选择)三个阶段。
1.预选策略
- CheckNodeCondition: 检查节点是否正常;
- GeneralPredicates: 通用预选策略,包括HostName(检查Pod对象是否定义了pod.spec.hostname)、PodFitsHostPorts(检查node上是否有空闲的pod需要的端口:pods.spec.containers.ports.hostPort)、MatchNodeSelector(检查pods.spec.nodeSelector与节点node选择器是否匹配)、PodFitsResources(检查Pod的资源需求是否能被node节点所满足);
- NoDiskConflict: 检查Node是否能满足pod所用存储卷要求
- PodToleratesNodeTaints: 检查pod的spec.tolerations可容忍污点是否完全包含节点上的污点;
- PodToleratesNodeNoExcuteTaints: 当pod的容忍污点或node上的污点信息发生变化,导致pod不能容忍node的污点时,pod会被node驱逐;
- CheckNodeLabelPresence: 检查节点标签是否存在;
- CheckServiceAffinity: 检查service的亲和性;
- MaxEBSVolumeCount: 检查亚马逊存储卷剩余可挂载数量(默认最大39);
- MaxGCEPDVolumeCount: 检查google平台存储卷可挂载数量;
- MaxAzureDiskVolumeCount: 检查Azure平台存储卷挂载数量,最大16;
- CheckVolumeBinding: 检查Node节点已绑定的pvc;
- NoVolumeZoneConflict: 检查某区域是否满足调节键的逻辑限制;
- CheckNodeMemoryPressure: 检查节点内存资源状况是否满足Pod需求;
- CheckNodePIDPressure: 检查节点进程状况是否过多;
- CheckNodeDiskPressure: 检查节点磁盘IO是否过大;
- MatchInterPodAffinity: 检查pos资源亲和性;
2.优选函数
- least_requested: 计算节点node剩余可用资源的比率,比率越大越优选;
- balancedResourceAllocation: CPU和内存占用比率的相近程度,越相近越优选;
- NodePreferAvoidPods: 根据节点注解信息(scheduler.alpha.kubernetes.io/preferAvoidPods)判断,存在此信息则倾向于不运行pod;
- TaintToleration: 将Pod对象的spec.tolerations与节点taint列表项的匹配度进行检查匹配;
- SelectorSpreading: 通过计算,匹配度越高,越倾向于将pod分散到不同的node;
- nodeAffinity: 节点亲和性;
- interpod_affinity: 计算pod亲和性,分值越大越容易被调度;
- most_requested:与least_requested相反,默认不启用;
- node_label: 根据节点是否存在相关标签选择调度,默认不启用;
- image_locality: 根据节点是否拥有相关镜像,使用某镜像的容易程度来优选,默认不启用;
二、高级调度方式
一、node亲和性实例
1.nodeSelector
实例
apiVersion: v1 kind: Pod metadata: name: pod-schedule labels: app: myapp tier: frontend spec: containers: - name: myapp image: ikubernetes/myapp:v1 nodeSelector: disktype: ssd
创建pod后发现pod一直处于pending状态,原因是node节点没有匹配到disktype的标签
[root@master1 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE pod-schedule 0/1 Pending 0 2m6s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling <unknown> default-scheduler 0/3 nodes are available: 3 node(s) didn't match node selector. Warning FailedScheduling <unknown> default-scheduler 0/3 nodes are available: 3 node(s) didn't match node selector.
这是假如在node2上打上这个标签,pod就在node2上运行起来了,如下所示:
[root@master1 schedule]# kubectl label nodes node2 disktype=ssd [root@master1 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-schedule 1/1 Running 0 6m45s 10.244.2.4 node2 <none> <none>
2.nodeAffinity
使用帮助:
[root@master1 schedule]# kubectl explain pods.spec.affinity.nodeAffinity KIND: Pod VERSION: v1 RESOURCE: nodeAffinity <Object> DESCRIPTION: Describes node affinity scheduling rules for the pod. Node affinity is a group of node affinity scheduling rules. FIELDS: preferredDuringSchedulingIgnoredDuringExecution <[]Object> The scheduler will prefer to schedule pods to nodes that satisfy the affinity expressions specified by this field, but it may choose a node that violates one or more of the expressions. The node that is most preferred is the one with the greatest sum of weights, i.e. for each node that meets all of the scheduling requirements (resource request, requiredDuringScheduling affinity expressions, etc.), compute a sum by iterating through the elements of this field and adding "weight" to the sum if the node matches the corresponding matchExpressions; the node(s) with the highest sum are the most preferred. requiredDuringSchedulingIgnoredDuringExecution <Object> If the affinity requirements specified by this field are not met at scheduling time, the pod will not be scheduled onto the node. If the affinity requirements specified by this field cease to be met at some point during pod execution (e.g. due to an update), the system may or may not try to eventually evict the pod from its node.
实例:
apiVersion: v1 kind: Pod metadata: name: pod-schedule-nodeffinity labels: app: myapp tier: frontend spec: containers: - name: myapp image: ikubernetes/myapp:v1 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: zone operator: In values: - foo - bar
因为是requiredDuringSchedulingIgnoredDuringExecution硬亲和性,此时Pod处于pending状态;
当改为referredDuringSchedulingIgnoredDuringExecution软亲和性后,重新创建Pod资源,则为运行状态,如下:
apiVersion: v1 kind: Pod metadata: name: pod-schedule-nodeffinity labels: app: myapp tier: frontend spec: containers: - name: myapp image: ikubernetes/myapp:v1 affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - preference: matchExpressions: - key: zone operator: In values: - foo - bar weight: 60
二、pod亲和性使用实例
1.podAffinity
实例
apiVersion: v1 kind: Pod metadata: name: pod-first labels: app: myapp tier: frontend spec: containers: - name: myapp image: ikubernetes/myapp:v1 --- apiVersion: v1 kind: Pod metadata: name: pod-second labels: app: backend tier: db spec: containers: - name: busybox image: busybox:latest imagePullPolicy: IfNotPresent command: - "sh" - "-c" - "sleep 3600" affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - myapp topologyKey: kubernetes.io/hostname
查看结果如下:
[root@master1 schedule]# kubectl apply -f pod-required-affinity.yaml pod/pod-first created pod/pod-second created [root@master1 schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-first 1/1 Running 0 9s 10.244.2.12 node2 <none> <none> pod-second 1/1 Running 0 9s 10.244.2.11 node2 <none> <none>
2.podAntiAffinity
与podAffinity相反,只需要把podAffinity改为podAntiAffinity即可,如下:
[root@master1 schedule]# kubectl apply -f pod-required-antiAffinity.yaml pod/pod-first created pod/pod-second created [root@master1 schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-first 1/1 Running 0 11s 10.244.2.13 node2 <none> <none> pod-second 1/1 Running 0 11s 10.244.1.9 node1 <none> <none>
三、污点调度
污点taint一般定义在node节点上,在pod上定义容忍度Tolerations来决定是否能容忍这些污点。taint的effect定义了对Pod的排斥影响结果:
- NoSchedule: 仅影响调度过程,对现存的pod不产生影响;
- NoExecute: 既影响调度过程,也影响现存的Pod对象,不能容忍的Pod对象将被驱逐离开;
- PreferNoSchedule: 最好不调度,但是也能调度
添加污点格式:kubectl taint NODE NAME KEY_1=VAL_1:TAINT_EFFECT_1 … KEY_N=VAL_N:TAINT_EFFECT_N [options]
您暂时无权查看此隐藏内容!
[root@master1 yaml]# kubectl apply -f toleration-deploy-pod.yaml deployment.apps/myapp-deploy created [root@master1 yaml]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES myapp-deploy-5f86f6ffdd-knmvk 1/1 Running 0 10s 10.244.1.10 node1 <none> <none> myapp-deploy-5f86f6ffdd-pftbz 1/1 Running 0 10s 10.244.1.12 node1 <none> <none> myapp-deploy-5f86f6ffdd-xnbqk 1/1 Running 0 10s 10.244.1.11 node1 <none> <none>
评论前必须登录!
注册