Backup and restore Kubernetes manifests with their status

25 Nov 2024

A bit of context

At my current gig, I'm currently working at automating kubernetes cluster provisioning. It's an hybrid environnement between on premise and cloud resources. I gave a try to the Kubernetes Cluster API but It did not provide the flexibility and the level of integration I needed to work within this bank. After weeks of testing and designing, the team I am working with, decided to go for an in house cluster operator.

Principle

When using the reconciliation pattern in an operator as described in Cloud Native Infrastructure one has to create Custom Resources to manage the lifecycle of the external dependencies needed by kubernetes clusters. One key feature is to use the status key of Custom Resources to store their actual state, example:


apiVersion: xyz/v1alpha1
kind: KubernetesCluster
metadata:
  name: my-custom-cluster-for-backup
spec:
  type: iks
  env: test
  kubeVersion: "1.28"
  osVersion: UBUNTU_20_64
  product:
	name: this is my solo experiment
	shortName: experiment
  uuid:
	value: aaa692e5-e6ce-4f19-ad20-6491ca74067c
  addons:
	- name: cluster-autoscaler
	  version: "1.2.2"
	- name: vpc-block-csi-driver
	  version: "5.2"
status:
  apiServerURL: https://my-private-endpoint.tld
  masterVersion: 1.28.23
							

This is a powerful design especially when you manage hundreds or thousands of objects. It's like a structured database but only accessible through Kubernetes API server. This pattern works best when every states of every objects can be achieved in an idempotent manner.

The problem

The statuskey is obviously not in the custom resource manifest so it is not recoverable from git, it only lives in the manifests inside the kubernetes clusters. In the example in the previous section, both apiServerURL and masterVersion are recoverable through the API, so states can be reconcilied by the operator.
What if certain states are unrecoverable in case of deletion of the custom resource or if some transitive states are needed to reach the expected state? It definitely means that the states of the custom resources shall be saved somewhere safe.

Velero to the rescue

The implementation of the Kubernetes API completely ignores the status key when you create or update a kubernetes object. Therefore it's a two step process to get back the object and its saved status. I played around with velero and I discoverd this very interesting feature. It's possible to restore an object status during a velero restore.

Let's try to save and restore the load balancer status of a kubernetes service.
First, run the backup:
velero backup create argocd --include-namespaces argocd

ArgoCD server's service:


apiVersion: v1
kind: Service
metadata:
  finalizers:
  - service.kubernetes.io/load-balancer-cleanup
  labels:
	app.kubernetes.io/component: server
	app.kubernetes.io/name: argocd-server
	app.kubernetes.io/part-of: argocd
  name: argocd-server
  namespace: argocd
spec:
...
  type: LoadBalancer
status:
  loadBalancer:
	ingress:
	- ip: 172.18.0.7
	  ipMode: Proxy
	  ports:
	  - port: 80
		protocol: TCP
	  - port: 443
		protocol: TCP
							

Now delete the service:
kubectl -n argocd delete svc argocd-server

Finally, restore the service by restoring the velero backup using the magic parameter:
velero restore create argocd --from-backup argocd --status-include-resources Service

As you easily noticed, just by adding --status-include-resources Service velero restored status only for the resources named in the command line. You can find further explanation by reading the doc

Now that my team and I are safe and sound, we will continue to deep dive in the operator pattern to manage even more resources out of Kubernetes with Kubernetes!