Reasons and countermeasures for KEDA ScaledJob pods not being automatically deleted

In KEDA's ScaledJob, when the number exceeds the number specified by successfulJobsHistoryLimit, completed jobs will be deleted, but for some reason, the pods created by Job are not deleted.

func (e *scaleExecutor) deleteJobsWithHistoryLimit(logger logr.Logger, jobs []batchv1.Job, historyLimit int32) error {
	if len(jobs) <= int(historyLimit) {
		return nil
	}

	deleteJobLength := len(jobs) - int(historyLimit)
	for _, j := range (jobs)[0:deleteJobLength] {
		err := e.client.Delete(context.TODO(), j.DeepCopyObject())
		if err != nil {
			return err
		}
		logger.Info("Remove a job by reaching the historyLimit", "job.Name", j.ObjectMeta.Name, "historyLimit", historyLimit)
	}
	return nil
}

After searching for various things, I came across the next article. Carefully the same up to the setting item name. Well, from the resources that I referred to when I thought about this setting item, it's the same name.

What was written here is mainly two ʻownerReference` metadata exists? This PR resolves the issue. This PR is Dec 6, 2018 no It was added to the v1.13 milestone, so it seems to have been fixed there. You have to make sure that the version of kubernetes is higher.

  ownerReferences:
  - apiVersion: batch/v1
    blockOwnerDeletion: true
    controller: true
    kind: Job
    name: <job-name>
    uid: 94ff084e-1a5b-11e9-b123-52540098c2e3

What are ownerReferences?

It is written in, but if you have this in the reference of your parent resource, it will be enabled when you set the policy of cascade delete. Other conditions were listed on the original page, so it may be necessary to set these.

Current behavior

Let's check the current situation first. There is a Job.

$ kubectl get jobs
NAME                                    COMPLETIONS   DURATION   AGE
azure-servicebus-queue-consumer-7x629   1/1           8s         114s
azure-servicebus-queue-consumer-9m7v6   0/1           3s         3s
azure-servicebus-queue-consumer-lqftl   1/1           6s         66s
azure-servicebus-queue-consumer-qjfhr   1/1           10s        2m25s
azure-servicebus-queue-consumer-qs5rb   1/1           20s        49s
azure-servicebus-queue-consumer-v8n7m   1/1           8s         60s

There is a pod created by Job and it has a similar name.

$ kubectl get pods
NAME                                          READY   STATUS      RESTARTS   AGE
azure-servicebus-queue-consumer-7x629-8kc2j   0/1     Completed   0          2m28s
azure-servicebus-queue-consumer-9m7v6-zgzsf   0/1     Completed   0          37s
azure-servicebus-queue-consumer-lqftl-67pnv   0/1     Completed   0          99s
azure-servicebus-queue-consumer-qjfhr-dhbgt   0/1     Completed   0          2m59s
azure-servicebus-queue-consumer-qs5rb-8qsr7   0/1     Error       0          83s
azure-servicebus-queue-consumer-qs5rb-v2d7d   0/1     Completed   0          70s
azure-servicebus-queue-consumer-v8n7m-dczp8   0/1     Completed   0          94s

Let's take a look at the contents using kubectl edit pod.

image.png

I didn't mean to be aware of it, but ʻownerReferences` is automatically set. Now, let's set the policy

PropagationPolicy

Looking at Job v1 batch: Delete , PropagationPolicy seems to be set. There are three values to choose from. What happens to the default seems to depend on the resource's finalizer settings, but I'm a custom resource and it's not defined. Choose from background or foreground.

Description in client application

It was easy because all I had to do was set the policy to fix it. When I actually tried it, both DeletePropagationBackground and DeletePropagationForeground worked fine, but the background was also deleted fast enough, so make the Background setting the default instead of waiting for the pod to be deleted. I made it.

deletePolicy := metav1.DeletePropagationBackground
deleteOptions := &client.DeleteOptions{
    PropagationPolicy: &deletePolicy,
}
err := e.client.Delete(context.TODO(), j.DeepCopyObject(), deleteOptions)
if err != nil {
    return err
}

Old custom resource cannot be deleted

Now, the old Keda ScaledJob had a problem where old custom resources couldn't be deleted. It seems that finalizer was set and it prevented the removal. I wondered why it wouldn't disappear no matter how many times I deleted it, but I solved it by following the issue here. You can now delete it by removing the definition of finalizer.

kubectl patch scaledjob -p '{"metadata":{"finalizers":[]}}' --type=merge

in conclusion

The interface of k8s is straightforward, easy to understand and fun!

Recommended Posts

Reasons and countermeasures for KEDA ScaledJob pods not being automatically deleted