Systems Engineering
What level of involvement is too much for LLMs?
IBM is probably one of the more disciplined organizations when it comes to building systems and managing their lifecycle.
Indeed, in school we learned about many systems and software engineering methodologies and tools from professors who either had worked for IBM or were currently working for IBM. They were partnered with the institution, and it was interesting learning about engineering for enterprise at scale early on. I think it shaped a lot of how I view the engineering discipline in general.
I learned about the Unified Modeling Language (UML) and systems design through an IBM lens, and while some of those philosophies and methodologies can, at times, be too rigid to apply at the startup level, it is very much a sane approach for enterprise organizations.
SYSTEMS ENGINEERING
Keep in mind systems engineering is a broad term and doesn’t just mean software or computer systems.
The Apollo spacecraft was a system.
The water treatment plant and the pipes that take sewage away from your home are systems.
Roadways, waterways, air traffic lanes.
All systems engineered by people.
There is an IBM Limited Edition of ‘Systems Engineering for Dummies’ which you can find for free at this IBM-sponsored link. I like these books as a gentle overview or skim of technical topics, and this one was actually written by a skilled technical writer. It provides a decent overview of engineering products via abstract modeled systems and tackles parallel topics like collaboration, iteration, and quality control.
The reason I bring this book up is because I like its simple definitions.
Systems engineering is an interdisciplinary approach to creating large, complex systems that meet a defined set of business and technical requirements. - Shamieh, Systems Engineering for Dummies (2011), p.13
There are many steps in systems engineering. We can pull together a nice canonically-ordered list using sources like the INCOSE and NASA systems engineering handbook and then summarize it in a few words:
- stakeholder expectation and problem statement
- requirements gathering and definition
- logical / functional decomposition
- architecture and design
- ‘trade studies’: build vs. buy, technology selection, vendor selection, algorithm/model choices
>>>>> BUILD <<<<<
- verification and validation
- systems integration
- operations and sustainment (deployment, maintenance, lifecycle management)
I’m using the mdsvex library to include Svelte components in Markdown, so let me drop the way I sometimes think about this using the ‘V-model’.
The V-model is still used, even here in 2026, by industries where safety, high reliability / availabilty, and strict regulatory compliance are paramount.
LLM and Systems Engineering
It’s no secret LLMs are great at some things, and not so great at others.
That’s not often apparent given the messaging from the large AI vendors in their marketing, which varies from ‘use it for everything’ to ‘it might be sentient’ 🙄.
One thing I’ve always struggled with in terms of ‘AI’ and its usage in projects I’ve worked on is it’s lack of context and grasping of causality. People will say, “well there are solutions for that”, and granted there are, but over time the context fills up and the returns on having that available to the LLM diminish.
If you use a tool like Claude code on smaller projects, it does very well most of the time as a pair-coder. It even can help you ideate swiftly on design and architecture. It’s also quite useful in terms of working with cloud resources on AWS, GCP, or Azure.
As your project grows and you have multiple services, deployments, virtual machines, serverless functions, and data stores, you start to see the point where an LLM “got you only so far”.
Back to the V-model
The V-model is actually a smart one in the sense that you can break it up into 3 main activities.
Decomposition, building, and integration.
What if we just had a requirement of ‘do the thing’ and gave that to an LLM?
It would try to build a single highly coupled mechanism and jam almost all of the logic into it. It would run on your machine and work fine. You might be skilled enough to deploy it somewhere like Vercel or Firebase.
But… there are a reason frameworks and methodologies exist for systems engineering.
Systems engineering doesn’t end when you skip from the left side of the V-model and speedrun the ‘build’ section. Refer back to the right side of the elbow…
People without a background in engineering will often try to leverage an LLM to do everything in short order. It’s often a lack of experience or time, and many times just the desire for the low-effort, quick-reward dopamine gain.
For trivial projects, you can do quite a bit with this mentality however.
Experimentation (via gifslop)

what you see when your ship full of cats is nearing the event horizon of a singularity*
One reason I built gifslop was to see how far I could get with an LLM when building an over-engineered and overly-complex system.
It’s basically a way to allow an LLM to create a GIF using code it generated (mostly coordinates, color maps). The GIFs are typically terrible, but hilarious.
It’s satire, really.
The system is comprised of a few pieces:
- SvelteKit static site frontend code
- Image service backend code
- Firebase hosting and analytics
- Namecheap domain registrar & Advanced DNS service
- Google Cloud Run Service (image service backend)
- Google Cloud Storage Bucket (image serving & caching)
- Claude agent & poller
- Github project repository (Issues are event that trigger the GIF Creation)
The poller and agent run locally on my machine and riff off of Github issues for now as the Firebase hosting, Cloud Run image service, and GCS bucket are essentially free at low volume.
One would hope a model as powerful as Opus 4.7 could help build a solid framework and project here, but is that the level of involvement we want from a non-deterministic and stochastic system?
Letting Claude take a crack
I gave a rather lengthy, but rigidly prescribed plan to Claude Code to try and ‘vibe-code’ it all together.
The resulting code for the frontend was simple, no issues there.
The code for the agent and image service were hard to read and reason about.
Once I added the Terraform for the Google Cloud infrastructure, and things like the LLM prompt, the requests to Cloud Run, the integration with GCS, it became clear that Claude Code was becoming more of a burden for a solo engineer than a helper.
That’s because I tried to ‘use it for everything’.
Remember, most models have stale knowledge from a year or even longer, unless they bring in new knowledge proactively by searching the web or ‘ingesting’ other content.
In many instances, Claude was using SDKs incorrectly, building incomplete types or classes, or missing the relationship between services. Without the ability to intuitively grasp cause and effect, LLMs can only get you so far.
Granted, eventually they get it right. After a while I was able to get Claude Code to connect the dots between deployed services, authentication layers, data stores, etc. But at what cost?
The answer to that question is hundreds. Given a very complex system, Claude Code can easily burn through hundreds of dollars of usage and eat up limits on most plans pretty quickly.
The other cost is, if this were a system that other systems or people relied on, it would not be very resilient, or scalable.
Using LLMs in Systems Engineering
LLMs can be a good way to organize thoughts, challenge assumptions, and generally ideate rapidly. I don’t like the idea of ‘thought-gen’, but using LLMs as a way to organize your own original thoughts isn’t insane.
Letting an LLM mostly engineer the system is not a great idea. Using it to help with syntax recall or boilerplate code generation, and even as a pair-coder to speed up development (within reason) is fair.
Letting the LLM build the entire thing is the wrong way to use the tool. It will end up costing more in failures, poor availability, people hours (to fix it), and cognition. From experience, I can tell you that reviewing AI generated code for large projects is brutal. You end up spending more tokens to bring in an LLM to try and search through the poorly designed haystack for one of many needles.
remember
Tools that seem like magic get applied to everything.
Cost and entropy will remind us why that wasn't a great idea.