Coding as a Game of Probabilities

Recently I’ve been thinking about coding with AI in terms of it being a process of navigating a tree of probabilistic outcomes. Most people using LLMs, especially with code, have a basic understanding of what they do (“bro, it just predicts the next token”) but in practice, thinking in terms of the relationship between input and output I’ve found more useful — if nothing else but to try and work out how it best fits, or doesn’t fit my day to day work. Given what I provide, what is the probability of getting the output I need? Or what percentage of the probable outcomes will work?

This framing I think gives a better perspective on the AI coding examples you see online, from the one shot game demos to the vibe code horror stories.

If I ask an AI to “make an HTML page with a black background and a 50px white square in the top left corner,” any decent model will produce this easily. The outcome is highly specific, and the path to get there is pretty narrow. Essentially theres only one right answer.

In some ways, the inverse is true when a request is highly non-specific. If I say “make me a game like Snake,” there’s a huge range of outcomes that would satisfy that brief. The AI can produce something that works partly because the target is so large. This is why some of the AI demos of one-shotted games and the like are both impressive and underwhelming at the same time. The fact that a model is able to produce working code, or an output that even satisfies that kind of input is impressive. But it’s also underwhelming in the sense that (in my experience at least), rarely do I need output with such wide goal posts.

In my typical professional work, neither of these conditions tends to hold. The job is usually about translating highly specific but abstract ideas into highly specific outcomes. Abstract in the sense that you are dealing with other peoples thoughts, existing knowledge or contextual information that isn’t explicitly defined. It’s a precise destination, but the route to get there is rarely mapped out in advance in a way that pin points one specific outcome.

It can also be very difficult to know exactly how to get from the abstract to the concrete up front, which is why the job isn’t just writing code — it’s discovery, iterative clarification of the goal, often using code itself as the mechanism. I don’t think this is defensive developer posturing, its just the nature of converting ideas into things.

This is why for most real world applications, the ‘one shot’ approach is never likely to work. Unless the required output is extremely probabilistic relative to your input, the likelihood of ending up at ‘the’ solution is unlikely, unless ‘the’ solution is any of the possible solutions.

I think this is why the path from idea to outcome is essentially a process of navigating a tree of probable answers to inputs. You either need to have answered every conceivable forking decision of the tree up front in a spec or brief, or you follow an iterative process of decision making as you go. I think this is true whether you are vibe coding or writing every line by hand. Either way, you are answering questions, finding answers and problem solving at each of these forks to direct the code to the necessary outcome.

From experience, the success of AI-assisted coding seems heavily tied to the ratio of input to output. If I give an LLM a large body of text and ask for a summary, the likelihood of a good result is high. The output is necessarily smaller and more constrained than the input. The inverse seems equally true: the smaller the input relative to the required output, the more variance in the result.

With code, this dynamic plays out not just at a project level but at the task level. I’ve recently been working on an ERP system and different parts of the job highlighted this to me.

Task one was adding a new API route. A controller, a model, validation, and so on. The project had established patterns and sat on a well documented MVC framework. The “input” wasn’t just my prompt; it was the existing codebase, the framework conventions, and the documentation (most of which probably sits in training data too).

It often feels like the prompt is entirely responsible for the output, but it’s really just the tip of the iceberg. The mass of existing code, patterns, conventions beneath your prompt constrain the probability space enormously. The AI was essentially synthesising existing patterns into a new instance. It’s akin to a summary.

I wrote a tight-ish spec that detailed the uniqueness of this specific end-point, let the AI implement almost all the code in one go, and the output was more or less exactly what I expected. Because it matched my expectations, it was straightforward to review and verify, the code looked the same as the lines I’d written myself.

Task two was entirely different: implementing an input field for user-defined expressions, requiring a state machine, fairly unique UI, and implementation details specific to abstract concepts about the project.

I tried the same approach of writing a spec and let the LLM build but the sheer volume of possible outcomes was too high. The code it produced worked to an extent, but the implementation felt fragile, with numerous edge-case bugs. My prompt was a much larger portion of the total input relative to the necessary output. In terms of probability, the chance that my input would derive the specific output I required was low. To be able to steer every aspect of the output via a spec alone seemed unrealistic.

To solve this, I needed to alter the input to output ratio. Throwing a massive prompt at the AI and hoping it successfully navigated the entire probability tree in one go hadn’t worked.

Instead, I opted to build it “together”. I wrote a detailed markdown file describing the specific functions, types, and data flow I would write myself. Then I had the AI implement each function just one or two at a time, reviewing and editing as the code developed.

By breaking the problem down, the possible outputs at every step were tightly constrained. The outcome became predictable because the input was highly specific and the required output was bite-sized. It was closer to pair programming than code generation. And because the architecture was effectively my design, built at a manageable pace, I didn’t have to waste time retroactively trying to reverse engineer and understand it.

In the end, it felt like a smaller boost in speed over simply writing the code by hand. But the real value wasn’t purely in output generation. Using the AI’s implementation as a thinking aid or a kind of step by step draft I could reason about, made the design process faster and more iterative. It felt like the most effective way of deliberately navigating down the branches of possible outputs to land exactly it needed to be. In some ways it felt more fun too.

Its worth noting that theres numerous other ways to approach the goal of narrowing the range of probabilistic outcomes. I’m sure some people would advocate for simply going back and forth with the AI upfront to preemptively narrow the scope before actually getting it to “code”. For me though the ergonomics of this approach in this particular situation seemed to fit.

This is where I currently feel comfortable: use the LLM for high-probability tasks, but own the code it produces. Understand it well enough to know which branch of the tree you’re on and whether it actually leads where you need to go. How to ensure the predictability of the output is also highly dependant on the nature of the task you are doing. Context does seem to be king, but not simply because you’ve got 10 random MCP servers and a folder full of skills filling the window.

When you frame coding this way, things like clean code, architectural patterns, and simply understanding your codebase remain important in my opinion. They constrain the probability space. They make AI output more predictable and more correct. And they give you the ability to steer when things drift. Not only that but you retain the necessary level of confidence in the output which in some projects is essential.

On complex projects, this feels less like a stylistic preference and more like a practical necessity. Software development is partly a process of discovery. You can’t specify every pitfall up front, which means you need to reason about what’s been built so far in order to navigate what comes next. And when you need to be absolutely sure things work as needed, I don’t think I’m currently willing to trust vibes alone.

I’m sure models will continue to get better, context windows will grow and their ability to extract relevant project context themselves will improve. I expect this will reduce the likelihood of reaching completely wrong branches or getting implementations that completely miss the mark.

But I don’t think it solves the deeper problem. You’re still the one who’s responsible for the outcomes of the code whether you wrote it or not. No matter how good models get at writing code, software development is, at its core, the translation of abstract ideas into specific outcomes. Until an AI can extract those ideas directly and knows exactly what you’re thinking, or more specifically your client / boss is thinking, with all the nuance and context and half-formed intuitions that entails, it’s still probability traversal.