Kubernetes PodGC Controller源码分析
Posted WaltonWang
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Kubernetes PodGC Controller源码分析相关的知识,希望对你有一定的参考价值。
Author: xidianwangtao@gmail.com
PodGC Controller配置
关于PodGC Controller的相关配置(kube-controller-manager配置),一共只有两个:
flag | default value | comments |
---|---|---|
–controllers stringSlice | * | 这里配置需要enable的controlllers列表,podgc当然也可以在这里设置是都要enable or disable,默认podgc是在enable列表中的。 |
–terminated-pod-gc-threshold int32 | 12500 | Number of terminated pods that can exist before the terminated pod garbage collector starts deleting terminated pods. If <= 0, the terminated pod garbage collector is disabled. (default 12500) |
PodGC Controller入口
PodGC Controller是在kube-controller-manager Run的时候启动的。CMServer Run时会invoke StartControllers将预先注册的enabled Controllers遍历并逐个启动。
cmd/kube-controller-manager/app/controllermanager.go:180
func Run(s *options.CMServer) error
...
err := StartControllers(newControllerInitializers(), s, rootClientBuilder, clientBuilder, stop)
...
在newControllerInitializers注册了所有一些常规Controllers
及其对应的start方法,为什么说这些是常规的Controllers呢,因为还有一部分Controllers没在这里进行注册,比如非常重要的service Controller,node Controller等,我把这些称为非常规Controllers
。
func newControllerInitializers() map[string]InitFunc
controllers := map[string]InitFunc
controllers["endpoint"] = startEndpointController
...
controllers["podgc"] = startPodGCController
...
return controllers
因此CMServer最终是invoke startPodGCController来启动PodGC Controller的。
cmd/kube-controller-manager/app/core.go:66
func startPodGCController(ctx ControllerContext) (bool, error)
go podgc.NewPodGC(
ctx.ClientBuilder.ClientOrDie("pod-garbage-collector"),
ctx.InformerFactory.Core().V1().Pods(),
int(ctx.Options.TerminatedPodGCThreshold),
).Run(ctx.Stop)
return true, nil
startPodGCController内容很简单,启动一个goruntine协程,创建PodGC并启动执行。
PodGC Controller的创建
我们先来看看PodGCController的定义。
pkg/controller/podgc/gc_controller.go:44
type PodGCController struct
kubeClient clientset.Interface
podLister corelisters.PodLister
podListerSynced cache.InformerSynced
deletePod func(namespace, name string) error
terminatedPodThreshold int
- kubeClient: 用来跟APIServer通信的client。
- PodLister: PodLister helps list Pods.
- podListerSynced: 用来判断PodLister是否Has Synced。
- deletePod: 调用apiserver删除对应pod的接口。
- terminatedPodThreshold: 对应
--terminated-pod-gc-threshold
的配置,默认为12500。
pkg/controller/podgc/gc_controller.go:54
func NewPodGC(kubeClient clientset.Interface, podInformer coreinformers.PodInformer, terminatedPodThreshold int) *PodGCController
if kubeClient != nil && kubeClient.Core().RESTClient().GetRateLimiter() != nil
metrics.RegisterMetricAndTrackRateLimiterUsage("gc_controller", kubeClient.Core().RESTClient().GetRateLimiter())
gcc := &PodGCController
kubeClient: kubeClient,
terminatedPodThreshold: terminatedPodThreshold,
deletePod: func(namespace, name string) error
glog.Infof("PodGC is force deleting Pod: %v:%v", namespace, name)
return kubeClient.Core().Pods(namespace).Delete(name, metav1.NewDeleteOptions(0))
,
gcc.podLister = podInformer.Lister()
gcc.podListerSynced = podInformer.Informer().HasSynced
return gcc
创建PodGC Controller时其实只是把相关的PodGCController元素进行赋值。注意deletePod方法定义时的参数metav1.NewDeleteOptions(0)
,表示立即删除pod,没有grace period。
PodGC Controller的运行
创建完PodGC Controller后,接下来就是执行Run方法启动执行了。
pkg/controller/podgc/gc_controller.go:73
func (gcc *PodGCController) Run(stop <-chan struct)
if !cache.WaitForCacheSync(stop, gcc.podListerSynced)
utilruntime.HandleError(fmt.Errorf("timed out waiting for caches to sync"))
return
go wait.Until(gcc.gc, gcCheckPeriod, stop)
<-stop
- 每100ms都会去检查对应的PodLister是否Has Synced,直到Has Synced。
- 启动goruntine协程,每执行完一次gcc.gc进行Pod回收后,等待20s,再次执行gcc.gc,直到收到stop信号。
pkg/controller/podgc/gc_controller.go:83
func (gcc *PodGCController) gc()
pods, err := gcc.podLister.List(labels.Everything())
if err != nil
glog.Errorf("Error while listing all Pods: %v", err)
return
if gcc.terminatedPodThreshold > 0
gcc.gcTerminated(pods)
gcc.gcOrphaned(pods)
gcc.gcUnscheduledTerminating(pods)
gcc.gc是最终的pod回收逻辑:
- 调从PodLister中去除所有的pods(不设置过滤)
- 如果
terminatedPodThreshold
大于0,则调用gcc.gcTerminated(pods)
回收那些超出Threshold的Pods。 - 调用
gcc.gcOrphaned(pods)
回收Orphaned pods。 - 调用
gcc.gcUnscheduledTerminating(pods)
回收UnscheduledTerminating pods。
注意:
1. gcTerminated和gcOrphaned,gcUnscheduledTerminating这三个gc都是串行执行的。
2. gcTerminated删除超出阈值的pods的删除动作是并行的,通过sync.WaitGroup
等待所有对应的pods删除完成后,gcTerminated才会结束返回,才能开始后面的gcOrphaned.
3. gcOrphaned,gcUnscheduledTerminatin,gcUnscheduledTerminatin内部都是串行gc pods的。
回收那些Terminated的pods
func (gcc *PodGCController) gcTerminated(pods []*v1.Pod)
terminatedPods := []*v1.Pod
for _, pod := range pods
if isPodTerminated(pod)
terminatedPods = append(terminatedPods, pod)
terminatedPodCount := len(terminatedPods)
sort.Sort(byCreationTimestamp(terminatedPods))
deleteCount := terminatedPodCount - gcc.terminatedPodThreshold
if deleteCount > terminatedPodCount
deleteCount = terminatedPodCount
if deleteCount > 0
glog.Infof("garbage collecting %v pods", deleteCount)
var wait sync.WaitGroup
for i := 0; i < deleteCount; i++
wait.Add(1)
go func(namespace string, name string)
defer wait.Done()
if err := gcc.deletePod(namespace, name); err != nil
// ignore not founds
defer utilruntime.HandleError(err)
(terminatedPods[i].Namespace, terminatedPods[i].Name)
wait.Wait()
- 遍历所有pods,过滤出所有Terminated Pods(Pod.Status.Phase不为Pending, Running, Unknow的Pods).
- 计算terminated pods数与terminatedPodThreshold的(超出)差值deleteCount。
- 启动deleteCount数量的goruntine协程,并行调用gcc.deletePod(invoke apiserver’s api)方法立刻删除对应的pod。
回收那些Binded的Nodes已经不存在的pods
// gcOrphaned deletes pods that are bound to nodes that don't exist.
func (gcc *PodGCController) gcOrphaned(pods []*v1.Pod)
glog.V(4).Infof("GC'ing orphaned")
// We want to get list of Nodes from the etcd, to make sure that it's as fresh as possible.
nodes, err := gcc.kubeClient.Core().Nodes().List(metav1.ListOptions)
if err != nil
return
nodeNames := sets.NewString()
for i := range nodes.Items
nodeNames.Insert(nodes.Items[i].Name)
for _, pod := range pods
if pod.Spec.NodeName == ""
continue
if nodeNames.Has(pod.Spec.NodeName)
continue
glog.V(2).Infof("Found orphaned Pod %v assigned to the Node %v. Deleting.", pod.Name, pod.Spec.NodeName)
if err := gcc.deletePod(pod.Namespace, pod.Name); err != nil
utilruntime.HandleError(err)
else
glog.V(0).Infof("Forced deletion of orphaned Pod %s succeeded", pod.Name)
gcOrphaned用来删除那些bind的node已经不存在的pods。
- 调用apiserver接口,获取所有的Nodes。
- 遍历所有pods,如果pod bind的NodeName不为空且不包含在刚刚获取的所有Nodes中,则串行逐个调用gcc.deletePod删除对应的pod。
回收Unscheduled并且Terminating的pods
pkg/controller/podgc/gc_controller.go:167
// gcUnscheduledTerminating deletes pods that are terminating and haven't been scheduled to a particular node.
func (gcc *PodGCController) gcUnscheduledTerminating(pods []*v1.Pod)
glog.V(4).Infof("GC'ing unscheduled pods which are terminating.")
for _, pod := range pods
if pod.DeletionTimestamp == nil || len(pod.Spec.NodeName) > 0
continue
glog.V(2).Infof("Found unscheduled terminating Pod %v not assigned to any Node. Deleting.", pod.Name)
if err := gcc.deletePod(pod.Namespace, pod.Name); err != nil
utilruntime.HandleError(err)
else
glog.V(0).Infof("Forced deletion of unscheduled terminating Pod %s succeeded", pod.Name)
gcUnscheduledTerminating删除那些terminating并且还没调度到某个node的pods。
- 遍历所有pods,过滤那些terminating(
pod.DeletionTimestamp != nil
)并且未调度成功的(pod.Spec.NodeName为空)的pods。 - 串行逐个调用gcc.deletePod删除对应的pod。
总结
PodGC Controller作为Kubernetes默认启动的Controllers之一,在Master后台每隔20s进行一次Pod GC。
- 通过
--controllers
可以控制PodGC Controller的开关。 - 通过
--terminated-pod-gc-threshold
设置gcTerminated的阈值。 - PodGC Controller串行的执行以下三个gc子过程:
- 回收超过阈值的Terminated Pods(Pod.Status.Phase不为Pending, Running, Unknow的Pods)。
- 回收那些binded的node已经不存在(不在etcd中)的pods。
- 回收那些terminating并且还没调度到某个node的pods。
以上是关于Kubernetes PodGC Controller源码分析的主要内容,如果未能解决你的问题,请参考以下文章
Kubernetes nginx入口控制器不转发带有下划线的标头