How will System Design evolve in the AI era?
Explaining 20 years of distributed systems in ~5 minutes
Note: We’re celebrating a pretty special milestone this month: Educative is turning 10! Over the next couple weeks, I’ll be sharing lessons I’ve learned while scaling a tech company, and how the state of development has evolved during some of tech’s most volatile years.
I’ll kick it off today with a topic that is near and dear to my heart: System Design.
I’ve been building distributed systems since before it was called "System Design."
Or at least, I was trying.
At Microsoft in the early 2000s, I was part of Project Red Dog (i.e. the team working on the predecessor to Azure). This was when there were no established “best practices” for System Design. We were just stressed out engineers scrambling to make services talk to each other without melting down in production.
Today, System Design has become a staple skill for developers, product managers, and engineering leaders alike. And thanks to AI, what it means to design a scalable system is changing once again.
So what does the future look like, and what does it mean for you?
(Spoiler: It likely means new expectations and challenges ahead. But if you didn't love change, then you probably wouldn't have chosen this career, right?)
To understand where we’re heading next, let’s first look to the past: starting with why System Design evolved from an obscure corner of architecture into a core skill expected of every developer.
Evolution of System Design (Early 2000s to Present)
Phase 1: Monolithic Foundations
In the early 2000s, monolithic systems were the default.
One big codebase, one deployment, one database.
From Salesforce to massive banking systems, this was the status quo. Systems were essentially giant applications: self-contained, single-deployable units that held everything from business logic to database access.
For a while, companies and users made do. But as internet usage exploded with millions of people now expecting seamless experiences, monoliths became a bottleneck.
Everything hinged on scalability. Companies like eBay and Amazon quickly found that what worked for a few thousand users didn’t work when global demand surged overnight.
Here's the core issue:
Scaling a monolith meant scaling the entire application. No chance to isolate or fine-tune just the hot spots.
This made deployment a high-stakes affair. I remember spending long nights poring over massive codebases, praying that the next commit wouldn’t take the entire system down. Teams grew cautiously because adding features meant adding more weight to an already-bulky monolith.
For my team at Microsoft, and for everyone else in that era, the challenge was clear: how do you make monoliths safer and easier to evolve?
We started pushing for clean layers and better modularity within the monolith, and clearer ownership boundaries. These are early hints of what would come next.
At this stage, System Design was still a niche skill, primarily the domain of senior engineers and architects working at a handful of companies. Most developers focused on writing code within their modules, with little exposure to overarching system architecture.
👉 Essential read: Amazon played a huge role in steering the transition out of the monolithic era. Learn more about how Amazon redefined System Design.
Phase 2: The Microservices Boom
The 2010s ushered in a new era. With the rise of the cloud, CI/CD, and global scale, monoliths couldn’t keep up.
Enter microservices.
The idea was simple: split your application into smaller, independently deployable services. Let teams own what they build, move faster, and deploy without waiting for the whole company. This approach mirrored team structure, where Amazon’s “two-pizza teams” became a rallying cry for autonomy and speed.
By this time I was at Facebook (now Meta). We leaned hard into microservices, and they let us ship features at a dizzying pace. We took the ‘own your service’ mantra so seriously that one of my teammates named his deploy script after himself (not a best practice, so that didn’t last long).
Microservices gave us modularity and let us scale globally while staying nimble. For big companies like Netflix and Uber, microservices delivered real gains:
Netflix could keep streaming even when one part failed; Uber could launch new features for different cities without slowing the rest of the platform.
But in practice, microservices also brought hidden costs:
Operational overhead: Dozens of services meant dozens of pipelines, environments, and alert fatigue.
Performance hits: Synchronous network calls added latency, and minor issues could cascade into major outages.
Cognitive overload: Developers had to track dozens of APIs and interactions. More mental overhead, less time building features.
It was during this time that Netflix famously created their "Chaos Monkey" tool to test how their microservices would behave under random failures, and chaos engineering became a staple in System Design.
And here’s where System Design’s role in interviews started to shift.
Companies like Facebook and Amazon began using System Design Interviews not just for senior engineers, but to differentiate strong mid-level developers too.
Unfortunately, many developers underestimated the importance of System Design skills. As an interviewer at Facebook myself, I saw even the most experienced developers aiming for E5 roles at Facebook would get downleveled to E4 if their design skills weren’t up to par.
System Design had become a key differentiator for anyone trying to move up the ladder.
Phase 3: The Rise of Modular Simplicity
By the late 2010s and early 2020s, a new trend emerged: modular monoliths.
Microservices proved to demand more overhead than their services were worth. And they certainly weren't the best answer for small teams and products still finding their market fit.
Instead of scattering logic across countless services, developers began grouping related functionality into coherent, single-deployable units with clear internal boundaries. We also saw coarser-grained services gain popularity (i.e. fewer, larger services instead of an explosion of fine-grained ones).
Several companies made changes:
Shopify famously moved back from a microservices sprawl to a modular monolith to improve developer velocity and reduce deployment anxiety (and happened to achieve remarkable uptime as a result)
Netflix began consolidating where microservices had proliferated too far, grouping related functions to avoid unnecessary fragmentation.
At Educative, we embraced this too. We found that tight-knit modules together in the same deployment let us ship faster without being buried in dependency hell.
That’s when I really understood: splitting services is easy. Knowing what to keep together is the real art of architecture.
This wasn’t a step backward. It was a realization that the best systems are modular, but they’re also cohesive. The simpler your architecture, the easier it is to evolve, debug, and onboard new engineers.
As these modular approaches spread, System Design Interviews became even more prevalent. Developers across many levels were expected to understand how their components fit into the larger system. As a result, even more junior developers began to undergo them in their interview process.
Phase 4: AI-Agentic Architectures
We’re now entering a new architectural chapter: AI-Agentic Architectures.
Unlike deterministic microservices, these architectures embrace probabilistic execution, where outputs vary based on nuanced context, not just fixed APIs.
We’re already seeing this play out around the industry:
Shopify leverages AI agents through tools like Shopify Magic to automate tasks such as generating product descriptions and managing discounts.
Microsoft launched Copilot Studio, making it easier for developers to build AI agents that automate workflows and plug directly into apps.
Google Cloud is exploring an agentic security operations center (SOC), where connected agents team up to handle security workflows with minimal human involvement.
Despite the prevalence of AI agentic architecture, there are still some real challenges associated with scaling these technologies:
Unpredictable behavior: Agents don’t always behave predictably. Their interactions can lead to surprising outcomes (both good and bad).
Observability: Traditional log-based tracking isn’t enough. We need to understand an agent’s reasoning paths and decision histories.
Testing: Deterministic unit tests fall short. Testing now involves sandboxing agents and evaluating probabilistic outcomes.
Security and guardrails: Agents acting off-script require strong oversight (e.g. fallback mechanisms, human-in-the-loop design, and clear escalation paths).
These systems also lean heavily on expensive LLMs, so until open-source or local LLMs mature, cost will be a major hurdle to how quickly we see agentic systems evolve.
👉 Check out our newsletter for a deep dive on agentic System Design: Rethinking Microservices with the Rise of AI Agents.
However long it takes agentic architecture to mature, System Design is already adapting to an AI-centric future.
As we see increasing integration of AI in our systems, Generative AI System Design questions are already popping up in interviews.
Generative AI System Design builds upon known System Design principles, and goes beyond them to address challenges unique to AI-powered systems, like:
Probabilistic behaviors
Unpredictable outputs
Weird failure modes like hallucinations and token cliffs
Whether you're building AI-powered features, integrating APIs, or scaling infrastructure, GenAI System Design is how you ensure your work can scale into reliable AI-powered services.
The future depends on developers who know how to build resilient systems around AI — I hope you step up to help build it.
6 System Design Lessons I Learned the Hard Way
You can learn a lot from whitepapers. But you learn a lot more from broken deployments and painful realizations that what you built yesterday doesn’t serve you today.
Here are a few System Design lessons that I hope you don't need to learn yourself:
Don’t over-optimize too soon. Let complexity emerge organically; don’t force it.
Services that talk a lot should live together. Don’t split them unless you’re sure they can thrive apart.
Architectural fixes can’t solve team misalignment. If ownership and incentives are messy, your code will be too.
Watch out for silent complexity. The cost of a fragmented system isn’t in dollars. It’s in time, cognitive load, and your team’s ability to ship.
Prioritize changeability over trendiness. The best systems are the ones you can still evolve when the landscape shifts.
Clarity and cohesion matter more than ever. AI agents (like humans) do better in simple, well-bounded systems.
From Niche to Non-negotiable
System Design has come from being a niche skill to a must-have for every developer.
And with AI and agents in the mix, it’s more essential than ever.
In the next few years, System Design skills are going to be expected of everybody in software. As AI takes on more of the repetitive work, even juniors will have to think at architecture scale sooner in order to guide agents effectively.
Whether you’re designing classic APIs or orchestrating AI-powered features, understanding how to build resilient, adaptable systems is the new baseline.
You can learn patterns, proven frameworks, and real-world case studies to stay ahead of the AI-oriented System Design wave with these popular Educative courses:
You can access these courses and more with an Educative subscription — now 50% off for our Anniversary Sale.
Next up, I’ll share how tech hiring has shifted over the years (and what you can do to stay ahead of the competition). Stay tuned.
Happy learning!
– Fahim