I spent much of February and March leaning heavily on AI agents to get a virtual escape room built. We wanted to do something at KubeCon in Amsterdam that was fun while also teaching people a little bit about Linkerd, an ultralight service mesh for Kubernetes, and we settled on an escape room party… but we weren’t fully greenlit until about six weeks beforehand.

That left far too little development time for me to do everything by hand, so I turned to AI agents working under my direction, usually having four or five work streams proceeding in parallel. I’ve taken to calling this “wrangling shoggoths” to emphasize the non-human nature of the agents, which is both their biggest strength and their biggest weakness.

Looking back, there are three critical things that I think will be important for anyone trying to do serious work this way:

  1. Start with the human experience.

  2. Play to everyone’s strengths.

  3. Have a human own the project.

The Human Experience

Escape rooms have puzzles to solve, but they also need an engaging narrative to tie everything together. We came up with the narrative first and let it drive all the early design decisions about the game. Where would the player be? What would they be doing? Why would they care?

This technique isn’t specific to games: when creating technology, you should always start with the human experience. Whether we call it “user stories” or “UX” or “DevX,” it’s always about making sure your technology is a good fit for the human users, rather than the other way around.

Non-humans can be effective tools here, especially in terms of offering avenues for brainstorming that we might be skipping over, but the human experience is fundamentally about human concerns rather than technological concerns. Keeping responsibility for these human concerns in the hands of a human instead of a non-human is likely to be the right answer for a long time to come.

Playing to Strengths

Shoggoths are very good at certain things. For example, they are (literally) inhumanly good at summarizing and at digging into a codebase to understand where things happen, and Claude Sonnet 4.5 is much better than I am at many coding tasks. This makes shoggoths incredible force multipliers: instead of spending hours carefully writing out code, I can instead carefully write out a goal and let shoggoths blaze through writing the code.

If I’m not careful in how I describe what I want, though, the shoggoths can go off the deep end and ruin what’s already there. This is very characteristic of the shoggoths’ weaknesses:

  • They’re not good at generalizing. An example: several times I asked them to fix a problem best solved with a single parameterized function, and instead they wrote four slightly different functions.

  • They don’t do a good job of thinking about other workstreams. I had to be very careful about setting up workstreams that wouldn’t overlap to avoid merge conflict hell.

  • They’re not good at unsupervised work. It’s very easy for the shoggoths to go charging down a path that’s not a good idea, get stuck there, and refuse to come out.

To be fair, these are all things that inexperienced human developers can have a tough time with as well! What’s different is that shoggoths are trained to be extremely confident, even in situations where a junior human developer will ask for help because they’re unsure. It’s very important to look carefully at what the shoggoth actually produces.

A shoggoth weakness that’s very different from humans is around security. On our development machines, we give user accounts certain permissions that implicitly rely on the idea that whatever the human developer does within the scope of those permissions is “safe” because it only affects the human. This entire concept breaks down with shoggoths: if I run (say) Claude locally, I now have an autonomous non-human pretending to be me. Since the OS has no way to know that it’s not me, it can’t protect me if the shoggoth tries to do something it shouldn’t. The controls that the tools try to enforce help, but they can’t solve everything.

This is a reason I really like the GitHub Agents model: I’m much more comfortable running a shoggoth as a GitHub agent, because it only has access to a sandbox environment with code I’ve already checked into GitHub.

Might the shoggoths eventually get better at all of this? Sure – but the biggest trick is going to be managing that extreme overconfidence, and it’s tricky to see how we fix that while the AI companies keep engineering in the sycophantic nature that they think people want.

Why the Human Owner Is Non-Negotiable

The final critical thing is that a human needs to own the project. By this, I mean both that the human needs to manage the work that all the shoggoths are doing and also that there needs to be a human with responsibility for getting everything delivered.

Managing the work is the simpler thing to discuss here. Dealing with four or five work streams running in parallel while working on the escape room was very much like managing a small team of extremely confident junior developers: I needed to give them tasks, check their results, make sure that everything could get landed smoothly, etc. It turns out that this kind of management doesn’t change much when some or all of the workers aren’t human, or because some human developers are using shoggoths themselves.

Responsibility for delivering the project is a little subtler. Shoggoths are tools, and tools don’t get to accept responsibility for anything. In the end, the responsibility must lie with the human using the tool, and saying that a project failed because the AIs didn’t deliver is simply a nonstarter: if your tools aren’t working, you need to choose different tools, and you need to do it early enough to still deliver the project.

Of course, this isn’t unique to development. If you’re using AI for documents, image generation, brainstorming, or anything else, ultimately you as a human will have to own what you’re presenting to others. We as a society have no provision for assigning responsibility to non-humans, and there are a lot of things we’ll need to work through to change that.

The Power and Perils of AI Agents

Shoggoths are amazing force multipliers and accelerators. They work many times faster than we humans can, and some things become possible with that acceleration that wouldn’t have been possible without it. However, they can also make mistakes many times faster, and their supreme confidence makes it hard for them to realize when they’re going off the rails and ask for help. The most successful ways we’ve found to use these tools start by recognizing that they are not humans, that we cannot trust them to act like humans, and that there must be a human responsible for the results they’re being used to deliver.

This can be hard. We’ve seen as far back as ELIZA in the 1960s that we humans find it scarily easy to anthropomorphize anything that uses human language – and when we’re dealing with the current crop of non-humans, it’s very easy indeed to fall into the trap of assuming that they’ll always act in the best interests of us and our projects. Taking advantage of the acceleration while steering clear of the traps requires a good sense of the strengths and weaknesses of the shoggoths, and careful attention to where you really need to stick with a human.

(Want to learn more? Kube-napping! Phippy’s Abduction on the Canals has all the details about the party, and you can also try the escape room yourself.)