link

Logo for g-c.dev

From DevOps to Platform Engineering

Platform Engineering looks like it's becoming the new buzzword in our world. But what is Platform Engineering?

This note is what I notice is changing from the evolution of DevOps and SRE to Platform Engineering.

Until a while ago, there was (there still is) the so called "DevOps engineer", and the role - initially covered for big part by evolved system engineers or recycled sysadmins - was extended to developers aiming to empower them and have them to build and run their software.

While this initially seemed a perfect idea, recently a problem starts emerging.

Developers are very distracted doing undifferentiated work, namely maintaining infrastructure.

What is undifferentiated work?

It's the type of activities that don't give immediate business value to your product. So, on other terms, developers are busy maintaining infrastructure and, as outcome, there is less capacity for building new feature in the application.

On top of that, I would argue, most of the developers are doing infrastructure because are forced to do it, due to "you build it you run it!" ideology.

They don't like it, since the infrastructure comes with a very different domain of challenges. The most recognised challenge is that you cannot easily swap infrastructure or change some layers of it without proper planning, multi-release, and hoping for the best.

This is inherently due to the fact that infrastructure as code does not follow the same patterns of software. There are radically different paradigm for implementation: infrastructure is declarative, software is imperative.

You can easily use SOLID principle for software. In infrastructure you can't. Or perhaps you can use part of it at code level but when it comes to production day-2, you cannot easily swap "interfaces" already deployed without planning properly backward compatible deployments and complex rollbacks.

Additionally, cloud providers don't offer the same flexibility for infrastructure programming that you can get in your favorite programming language.

I also want to mention that when a developer implements and maintain infrastructure and see these tasks as additional burden, she/he will put less effort in achieving operational excellence and security. The first will impact in cost and reliability, the latter is a general risk for the company.

What is happening now?

Companies started figuring out that this pattern doesn't work at a scale, and their infrastructure is accumulating so much tech debt and security risks. This because the subject matter experts (SME) are not involved properly.

It's time to rethink the pattern. It's important to have developers still oversee the journey of their product from laptop to production but, after leaving the laptop, the code/product will proceed to production via abstractions.

These should be controlled abstractions where the best practices are embedded and not exposed (so no opt-out is possible) and the production infrastructure is maintained by the proper SME role: Site Reliability Engineers (SRE).

Platform Engineering is the new SRE

Proper SREs (a la google), come with a software engineer mindset together with system engineering knowledge (sorry old sysadmin, time to evolve!), and they will implement software (not scripting, but implementing) to offer the abstractions to the developers.

Proper SREs should also have a background in system engineering and acquired experience in solving production incidents, so they built overtime expertise and experience (latter is very important). With these skills they are able to properly design these abstractions and embed operational excellence and security on all stages.

Thanks to their software engineer mindset (not need to be developer guru but it's important the mindset), they can offer the suitable APIs for the developers to enable and accelerate them to deliver their code safely and with confidence to production. On top of this they will offer the developers also the tools to interact with production when problems with the product application arise (incidents always happen everywhere and at every layer), so they can both (Developers and Platform Engineers) collaborate in maintaining the product and infrastructure from day-2 onwards.

Warm welcome to Platform Engineering.