link

Logo for g-c.dev

Platform Engineering in K8S: dependencies

An imaginary nightmare

Now, let's imagine we created our K8S shared platform where you can issue "slice of the platform" (in a form of namespace or what else) to your developer teams.

Developer teams are very excited they can finally deploy their own application packaged with Helm or via Kustomize (I will stick with Helm for this example) and start deploying the manifests and running successfully the workload in the Platform.

As we all know, Kubernetes is very dynamic and its own APIs mature, change versions and also get quickly deprecated.

Nine months after the platform production day1, the version of your Ingress API manifest that was v1beta1 has been graduated in the meanwhile to v1, and the old version is removed in the new Kubernetes release.

You - Platform Team - want to plan the upgrade of all the clusters in the Platform, but you quickly notice your 50 development teams are still using the old API version in their manifests. You try to alert them to upgrade the code. You quickly realise your priorities doesn't match with the product development teams priorities.
You are stuck. That's because you just created some nasty external dependency, that removes you the autonomy to upgrade your product (Platform is the internal product consumed by the engineers).

You wake up. Luckily it was just a dream.

Dependencies across Developer and Platform teams is a foreseeable challenge you have to take into account when you plan and design a Platform Engineering layer in your company.

How to solve it? Well, other success case are out there, let's check how they did solved it.

Option 1: Strict contracts between Platform and Product Development teams

This is the most usual model. You onboard the Development teams and you establish a clear contract stating each own responsibility: development teams will be in charge of keeping their manifest up to date, otherwise their workload will be impacted in case of cluster upgrade.

You might set additionally a sort of "grace period" when you send out high-priority notifications when something is not proactively happening in the Development teams. But, still, if there is no action their workload will be cut-out from the cluster.

I personally see this approach not really reasonable/actionable in a company. After all, you don't want to cause any disruption of service on your final users, so you will end up postponing the upgrades and dealing with a lot of coordination activities.

Option 2: Developer teams are blind of K8S manifest knowledge

In this case you will magically produce the manifests in your special CI/CD process and delegate to the Development teams only the maintenance of some specific metadata in a custom descriptor that Platform team design and maintains (might be YAML as well, usually in the root of the repository).

This gives a lot more of flexibility: Platform team is the owner of the final templated manifests and they will produce the type of K8S Resources aligned with the target version of the cluster.

This approach is pretty common, I saw this in various companies.

Whilst I like the way of taking ownership and control, I feel that in this way the code the developers see in their repo, is not 1st class citizen for local development.

In other words, if I'm a developer and I want to try my code in my own KinD cluster on my laptop, I'm not able to do it because the manifests are generated by Platform in a later stage.

You - Platform team - need then to maintain a provide tools/api for generating on the fly the manifests for the Developer: Ok, it's doable, but what about the immutability of my manifests?
I don't know - after all - if my code in the repository reflects what is deployed in the Platform.

Option 3: Base Helm chart maintained by Platform

Helm has the nice capability to depends on other charts.

So, if Platform team maintain a base helm, the Developer can be owner of his own Helm chart, setting the dependency on a base helm.

This way, whenever I - developer - want to deploy my workload I can just use the well-known patterns with Helm command, and I will get the manifests as expected, and still be able to oversee the entire artefact production process.

Additionally, Helm provides nice features like pre-install hooks, where Platform team can provide some embedded (on base chart) validations that will be applied both in local development and when the actual Helm will be pushed/pulled into the Platform.

Also, Helm supports lookups, so the base template can be smart enough to set the proper API versions depending on the availability of it on the target cluster.

When it's time to upgrade (or even way before), Platform team will issue a new version of base Helm (and Helm has also capability of querying the available APIs in the cluster and take templating decision based on it), and the Developer Teams will transparently build and deploy the new versions of their workloads compatible with the new upcoming versions of Kubernetes orchestrators.

This approach seems much more reasonable to avoid abstracting too much away from Development team while keeping control over the final artefact and allowing high level of autonomy between Developer and Platform teams.

If I put myself in the developer shoes, I still see a simple yaml (values.yaml) in my Helm Chart and that's it. Templates are maintained by Platform Team in the base helm, so if I check my /template folder is empty.
I only have to fill-up the (hopefully) well commented and simple configuration in values.yaml and I can try out the final manifest locally (with helm template) or in my KinD cluster and eventually in my remote environment.

Bonus: Add always proactive checks

There are also nice tools you can include in the build pipeline for the Developer Teams that can validate the compatibility of their artefact with the upcoming Kubernetes versions and fail-fast the pipeline if you want them to onboard new versions of your base chart.

We - for example - use Pluto targeting next versions of K8S to identify incompatible APIs at build time.