The Missing Revolution in Operations: Scaling Expertise in a Cloud-First World
In the early 2000s, software engineering had its revolution. We collectively figured out how to scale not just our code but ourselves as engineers. Ideas were tested, refined, and shared: principles like solid design, frameworks like 12-factor apps, and a deeper understanding of collaboration. It wasn’t just about the technical side—we also tackled the human side of software development. These advancements made us better as an industry, though there’s still much more to learn.
Operations hasn’t had that moment yet.
What Ops Is Missing
Today, many ops engineers are transitioning from managing physical data centers to managing the cloud. They’re learning tools like Terraform, often without traditional software engineering backgrounds, and that’s okay. Reskilling is necessary and valuable. But there’s a significant gap: too many ops engineers aren’t scaling their expertise.
The lessons software engineers learned decades ago about scaling systems and practices haven’t fully made their way into operations. Instead of encoding expertise into tools and processes, many ops teams rely on manual workflows or undocumented institutional knowledge. This often leaves teams struggling to keep up with the pace of development, creating unnecessary friction.
This gap isn’t just inefficient—it’s unsustainable. And the problem is magnified by the fact that there are fewer and fewer operations engineers every year, especially relative to the number of developers entering the workforce. As noted in The Elephant in the Cloud, the demand for infrastructure engineers continues to grow, but the supply hasn’t kept up.
Why Scaling Expertise Matters
An operations engineer’s job isn’t just about keeping systems running; it’s about making teams faster, more efficient, and more reliable. That doesn’t happen by hoarding expertise. It happens by packaging your knowledge into reusable systems and automating as much as possible.
Let’s be clear: operations isn’t exempt from automation. The idea that what you do is too complex or specialized to encode into a system is absurd. The entire industry’s purpose is to replace inefficient processes, and ops is no exception.
If you’re an operations engineer, you have a choice:
- Hold onto your expertise and make your team dependent on you, slowing them down in the process.
- Or package your expertise into reusable tools, scaling yourself and enabling everyone around you.
The second path is where true value lies.
What About DevOps?
DevOps could have been the beginning of this revolution for operations, but it came about five years too early. At the time, the cloud was still largely about foundational resources that you deployed on, not the managed services your app is composed of today. DevOps didn’t adjust to that shift. For many, it fell to the wayside as a confusing term—sometimes referring to teams, sometimes to culture, and often just to “the stuff developers don’t have time to do.” For many folks, DevOps has become synonymous with CI/CD and little more.
That said, DevOps laid important groundwork. It highlighted the importance of collaboration between development and operations, and its principles are still valuable. The challenge now is to evolve those principles to fit the modern cloud ecosystem, where automation and composability are key.
Becoming a 10X Multiplier
Here’s the reality: operations engineers have the potential to be the only true 10X engineers in an organization. Why? Because their work directly impacts the productivity of every other engineer on the team. By building better tools, creating reusable modules, and embedding knowledge into systems, ops can multiply the effectiveness of the entire engineering organization.
Now, I hear the complaints: “Systems are too different, you can’t do this.” motions toward Kubernetes. Let’s be real: this isn’t about perfection. Everything doesn’t have to fit a golden standard or conform to a single rigid framework. What matters is creating golden paths that go beyond just “how to deploy your app” and extend into how to manage commodity infrastructure. Golden paths aren’t about locking down every decision—they’re about enabling speed and consistency for the things that don’t need to be reinvented every time.
At the same time, it’s important to acknowledge that this shift isn’t easy. Ops engineers are often under pressure to keep things running and may not have the time or resources to invest in these changes. Recognizing these challenges is essential to fostering a culture where improvement feels achievable rather than overwhelming.
The Cost of Falling Behind
The need to scale expertise isn’t just about efficiency—it’s about survival. There are fewer and fewer operations engineers every year, and the demands on those who remain are only increasing. Organizations can’t afford to have their ops teams moving slowly or bottlenecking progress.
If you’re resisting automation or refusing to package your expertise into reusable systems, you’re holding the entire industry back. You’re making it harder for businesses to move forward, and you’re putting your own relevance at risk.
The Path Forward
When thinking about scaling expertise, reflect on lessons from software engineering's revolution during the 2000s and 2010s: API design, clean abstractions, and composability. These principles allowed developers to encode their knowledge into reusable, modular systems that others could build upon. Ops needs to adopt the same mindset—to design processes and systems that empower teams to work faster and more consistently, without unnecessary complexity.
It’s time for operations to have its revolution. Embrace the principles that transformed software engineering: focus on designing APIs for your tools and workflows, creating clear abstractions, and ensuring composability. Think beyond immediate problems and encode your expertise in ways that allow others to easily use and extend your work. Apply these principles to infrastructure and empower your organization to move faster.
If you’re an ops engineer, here’s the challenge: start small. Pick one piece of expertise you use every day, whether it’s spinning up infrastructure, managing a resource, or resolving an issue, and encode it into a reusable module. Document it, automate it, and make it accessible to your team. Then do it again.
This is how you scale yourself. This is how you grow your value. And this is how you move the industry forward.
If you’re not ready for that, you’re not just falling behind. You’re making it harder for everyone else to move forward.