PromptOS: A Field Reference for the Failures Nobody Documents
In my last post, I toured LeanAgileScript and was upfront about how much of it is still scaffolding. This post is about a different kind of work-in-progress: a book called PromptOS, and the debugging sessions that made me start writing it down in the first place.
The Problem That Started It
The failure that stuck with me wasn't dramatic. A prompt that had worked fine in testing started producing malformed output in production — not every time, just often enough to be a support ticket instead of a bug report. I went looking for a reference on why prompts fail this way, the same kind of reference I'd reach for if this were a null pointer exception instead. There wasn't one. Plenty of material on how to write a good prompt. Almost nothing systematic on how a previously-good prompt goes wrong, or how to diagnose it once it has.
So I started keeping notes. The notes became a catalog. The catalog became a book.
What's In It
PromptOS is organized into six layers, progressing from vocabulary to production operations:
| Layer | Focus |
|---|---|
| 1 — Foundations | Core principles, reusable patterns, a starter template |
| 2 — Patterns & Techniques | Advanced and expert-level prompt design strategies |
| 3 — Anti-Patterns & Debugging | Failure identification, root-cause analysis, fixes |
| 4 — System & Multi-Agent Architecture | System prompts, orchestration, handoffs, arbitration |
| 5 — Security | Defensive hardening and adversarial red-teaming |
| 6 — Quality & Operations | Testing, benchmarking, lifecycle, observability, drift |
Twenty-three documents in total. It reads less like a single narrative and more like a reference you keep open in another tab — which is intentional. I wanted the thing I couldn't find during that production debugging session.
Three Things You Can Use Today
The five core principles, from the foundations layer, are the ones I actually check a prompt against before shipping it: clarity over cleverness, one goal per prompt, context is fuel, structure drives quality, and requirements act as acceptance criteria. That last one does the most work — if a prompt's requirements aren't specific enough to fail a review, they're not specific enough to succeed one either.
An anti-pattern worth naming: "Overloaded Prompts." The shape is familiar to anyone who's padded out a single prompt with everything the task might need:
Write documentation, generate tests, and create a deployment plan.
It fails for the same reason an overloaded ticket fails — conflicting priorities produce shallow,
mixed output because the model has no signal for which goal to optimize when they trade off against
each other. The fix in the book isn't "write shorter prompts," it's scoping explicitly: naming the
one goal, and then naming what's deliberately out of scope. Do not address monitoring, testing, or architecture — only performance does more work than deleting words.
A debugging ritual: before asking why an output is wrong, list the assumptions the prompt is
making. List any assumptions you are making before answering is a single line you can append to
almost anything, and it turns a silent wrong-guess into a visible one you can correct. In one
worked example in the book, a model kept ignoring a required JSON schema — the diagnosis wasn't a
model problem, it was that no example schema had actually been given. The fix was one line: Output Format: Return ONLY valid JSON matching this schema: {...}. Most of the debugging guide reads like
that — small, specific, reproducible.
Where It Actually Stands
Consistent with how I've written about the DSL work, I'd rather tell you the real state than a tidy one. The 23-document manuscript is content-complete — every chapter is drafted, not stubbed. What isn't finished is the narrative layer: the book opens with quotes and framing from people actually shaping this technology — Andrew Ng, Harrison Chase, Dario Amodei, and others — meant to be woven through the individual chapters. Right now those live together in a separate preface document instead of interspersed where the preface itself says they belong. Content is done. Structural editing — making it read as one book instead of 23 well-organized documents — is the work still ahead of me.
Why This Matters Beyond the Book
The instinct behind PromptOS is the same one behind everything else in this series: treat a category of failure as something to catalog and systematize, not something to relearn from scratch every time it bites you. LeanAgileScript tries to close the gap between an agreed specification and a verified one. PromptOS tries to close the gap between "the prompt worked in testing" and "I understand why it stopped working in production." Different surface, same underlying discipline.
Next up in this series is the other book I'm writing — one aimed less at debugging individual prompts and more at the process failures that sink early-stage products before the product itself is even the problem.
If you've hit a prompt failure mode that isn't in a catalog anywhere — something that took you longer to diagnose than it should have — I'd genuinely like to hear about it. That's exactly the kind of lesson this book exists to save someone else from learning the hard way.