link

Logo for g-c.dev

Are we reinventing the cloud?

TL;DR; this post is a rant about the unnecessary complexity and duplication that I - so far - didn't see beneficial while running workloads on managed Kubernetes

Incipit

Everything started from this comment on LinkedIn, comment that I support 100%.

comment on linkedin about kubernetes

original comment here

Delusional behaviour: Kubernetes everywhere

kubernetes everywhere

The more I'm using Kubernetes the more I'm puzzled about the direction we are going.

Kubernetes is an excellent abstraction layer, that can be installed over a pre-existing infrastructure and it implements a robust framework on top of which you can deploy components that will offer the possibility to have the underlying infrastructure "software-defined".

But, there is a but

When you run on-premise infrastructure, you most likely have a small part software defined (vmware, openstack) but not the entire set of services your workload need. Hence the need of Kubernetes.

Software defined is the core principle of every major cloud provider. And a robust framework is exactly what AWS, Azure, GCP do offer via their own APIs, SDK and CLIs.

The Duplication

Then let's consider the components you need (yes, you must) to install on top of Kubernetes to offer additional services: Access Control, DNS management, Load Balancing and traffic routing, Deployment Automation, Secrets management to make a few example, but the list is longer and longer.

Actually lets make this list more pragmatic. As an engineer, in order to install and manage a workload on Kubernetes on AWS I have to provision cloud resource and K8S components. For fun I will try to group them together to showcase what I believe are unnecessary duplications.

IAM

on AWS self-managed on K8S
IAM Roles Roles
Policies Role Bindings
Trust-Relationships Service Accounts

Networking

on AWS self-managed on K8S
Security Groups Network Policies
Load Balancer, Target Groups Ingress Component, Routes
DNS Hosted Zone (Route53) External-DNS Component
ACM Certificates Cert-Manager Component

Sidenote:

Scaling & Elasticity

on AWS self-managed on K8S
Autoscaling group, Launch Templates Cluster Autoscaler / Karpenter component

Secrets Management

on AWS self-managed on K8S
Secrets Manager entities External Secret Operator component, etcd secrets

Observability

on AWS self-managed on K8S
CloudWatch agent Kube Prometheus Stack, Grafana, Metric Server Components, Logging Agent (fluentd, promtail, ....)

Additionally on K8S to make the life easier

self-managed on K8S
Reloader for config map/secret rotation
Trivy for keeping an eye on the potential vulnerability popping up in the running containers

Day 2

Now we have to maintain the infrastructure up to date with the new functionalities coming from the Cloud Provider (AWS SDK, deprecations, innovations) but also we - additionally - need to:

  1. Periodically update Kubernetes Control Plane
  2. Carefully rotate the Kubernetes Nodes when the time comes (hopefully the workload has Pod Disruption Budget)
  3. Check for vulnerability each one of the images for the self-managed components
  4. Periodically check for un-compatible API version of the K8S manifests with upcoming K8S releases
  5. Update periodically the self-managed components (all of the above, at least, the more you add the more you need to check release notes and update with their own cadence)
  6. Ensure we maintain proper compatibility between components and K8S versions
  7. When there is an problem, open a bug upstream or better a Pull Request, wait for approval and then you can proceed with updating the impacted component.

The above is the basic hygiene of your infrastructure, without factor in additional effort for modernising the K8S realm as well.

The Questions

Is it fu@*@! worth it?

Wouldn't be better to just off-load everything to the cloud provider and use the commodities the cloud provider offers for running the only thing we actually care about in our company that is the final workload?

Why don't we use ECS + Fargate? or Azure Containers or GCP Cloud Run? The underlying network and stuff, we still need to provision it in any case.

AWS, Azure, GCP, they already offer via their robust framework all the services we need. The main difference is that when you install a components on top of Kubernetes you don't have an SLA, it's managed by you and you need to come up as well with the proper maintenance, day-2 operations and upgrade patterns. And take care as well of the Kubernetes upgrades.

The Bet

If you are a multi-cloud company (are you really?), I'm more and more convinced that it would be less costly and less frustrating (incidents-wise) to setup small cloud-experts teams to maintain simpler infrastructure, with reduced maintenance operations, and run the workload in each cloud in the vendor-locked in flavour.