At my current gig, I'm currently working at automating kubernetes cluster provisioning. It's an hybrid environnement between on premise and cloud resources. I gave a try to the Kubernetes Cluster API but It did not provide the flexibility and the level of integration I needed to work within this bank. After weeks of testing and designing, the team I am working with, decided to go for an in house cluster operator.
When using the reconciliation pattern in an operator as described in
Cloud Native Infrastructure one has to create Custom
Resources
to manage the lifecycle of the external dependencies needed by kubernetes clusters.
One key feature is to use the status
key of Custom Resources to store their actual
state, example:
apiVersion: xyz/v1alpha1
kind: KubernetesCluster
metadata:
name: my-custom-cluster-for-backup
spec:
type: iks
env: test
kubeVersion: "1.28"
osVersion: UBUNTU_20_64
product:
name: this is my solo experiment
shortName: experiment
uuid:
value: aaa692e5-e6ce-4f19-ad20-6491ca74067c
addons:
- name: cluster-autoscaler
version: "1.2.2"
- name: vpc-block-csi-driver
version: "5.2"
status:
apiServerURL: https://my-private-endpoint.tld
masterVersion: 1.28.23
This is a powerful design especially when you manage hundreds or thousands of objects. It's like a structured database but only accessible through Kubernetes API server. This pattern works best when every states of every objects can be achieved in an idempotent manner.
The status
key is obviously not in the custom resource manifest so it is not recoverable
from git, it only lives in the manifests inside the kubernetes clusters.
In the example in the previous section, both apiServerURL
and
masterVersion
are recoverable through the API,
so states can be reconcilied by the operator.
What if certain states are unrecoverable in case of deletion of the custom resource or
if some transitive states are needed to reach the expected state? It definitely means that the
states of the custom resources
shall be saved somewhere safe.
The implementation of the Kubernetes API completely ignores the status
key when you
create or update a kubernetes object.
Therefore it's a two step process to get back the object and its saved status. I played around with
velero and I discoverd this very interesting feature.
It's possible to restore an object status
during a velero restore.
Let's try to save and restore the load balancer status of a kubernetes service.
First, run the backup:
velero backup create argocd --include-namespaces argocd
ArgoCD server's service:
apiVersion: v1
kind: Service
metadata:
finalizers:
- service.kubernetes.io/load-balancer-cleanup
labels:
app.kubernetes.io/component: server
app.kubernetes.io/name: argocd-server
app.kubernetes.io/part-of: argocd
name: argocd-server
namespace: argocd
spec:
...
type: LoadBalancer
status:
loadBalancer:
ingress:
- ip: 172.18.0.7
ipMode: Proxy
ports:
- port: 80
protocol: TCP
- port: 443
protocol: TCP
Now delete the service:
kubectl -n argocd delete svc argocd-server
Finally, restore the service by restoring the velero backup using the magic parameter:
velero restore create argocd --from-backup argocd --status-include-resources Service
As you easily noticed, just by adding --status-include-resources Service
velero restored status only for the resources named in the command line. You can find further
explanation by reading the doc
Now that my team and I are safe and sound, we will continue to deep dive in the operator pattern to manage even more resources out of Kubernetes with Kubernetes!