LLMs and bon bons
GenAI is like a box of chocolates - you never know what you’re going to get. And just like that box of chocolates, sometimes you get the perfect piece, and sometimes you bite into that bon bon that makes you question your life choices.

I wish I could claim this analogy as my own brilliant insight, but a quick search revealed it was already used in a 2023 research paper. Great minds think alike, I suppose.
The Problem with Non-Determinism
Software engineers hate non-deterministic results. When I write code to calculate 1.5 × 1.5, I know I’ll get 2.25, every, single, time. But ask an LLM a question? You’ll get an answer, but not necessarily the same answer twice. When I’m running a critical function or business query, I can’t tolerate that kind of uncertainty.
LLMs are non-deterministic by design. There’s a lot of science behind why this happens, and much smarter people than me can explain the technical details.
But here’s what strikes me as odd: we’re essentially taking a highly intelligent entity and telling it to dial down the creativity. We’re tying one hand behind its back and asking for a dumbed-down version of what it’s capable of.
Today, we don’t fully trust these tools - not like we trust a highly robust and mature programming language.
Our Current Workarounds
I’m seeing this tension more frequently, and honestly, I’m not sure if we’re doing the right thing. Is it crucial that the answers I get from an LLM need to be the same each time? In many cases, probably not. The non-deterministic nature is something we can live with. But there are certain use cases where this kind of behavior can be catastrophic and completely unacceptable.
So how are we dealing with this today? We structure our ever-growing prompts, provide very specific instructions on what the model can do, what exactly it should return, what exactly it should not return - all to get a predictable result.
Here’s a typical example of what I mean:
- You MUST follow the strict TDD cycle: RED → GREEN → REFACTOR
- You MUST implement only what is needed to make the current test(s) pass
- You MUST follow the coding style and conventions of the existing codebase
- You MUST keep code comments concise and focused on key decisions
- You MUST follow YAGNI, KISS, and SOLID principles
- You MUST present implementation options to the user before proceeding
- You MUST explain the reasoning behind implementation choices
And most of the time, it works. But there’s a more elegant approach than just piling on more constraints into our prompts.
A Better Way Forward
One mechanism that I really like and have been using for a while is Strands Agent SOPs.
Agent SOPs (Standard Operating Procedures) are markdown-based instruction sets that guide AI agents through sophisticated workflows using natural language, parameterized inputs, and constraint-based execution. They transform complex processes into reusable, shareable workflows that work across different AI systems and teams.
This doesn’t solve the problem of ever-growing prompts and context, but it does standardize them in such a way that they become “determin-ish-tic” (a phrase that was coined internally - personally, I prefer “determi-nish-tic”, but let’s not split hairs over this).
What I love about this approach is that it’s structured enough to ensure consistent outcomes, but flexible enough to leverage the intelligence that makes AI agents valuable. I’ve adopted this on a regular basis in my daily workflow to help me produce consistent results - in my code, in my documents, and even in my blog posts and presentations.
The beauty is that you’re not fighting the non-deterministic nature of LLMsת you’re channeling it in a productive direction. It’s like having a creative colleague who needs clear guidelines but can still surprise you with brilliant insights within those boundaries.
Looking Ahead
Will we see deterministic large language models in the future? Maybe. It seems there might be enough demand for this in the not-so-distant future.
The question isn’t whether we can build a model that gives us consistent results while remaining creative enough to be extremely valuable - it’s whether we should. Sometimes the best insights come from that unexpected piece of chocolate you didn’t plan to bite into.
What’s your take? Are you team deterministic, or are you comfortable with the uncertainty? Have you found your own ways to tame the chocolate box? I’d love to hear your thoughts. You can reach out to me on Twitter or through the contact form.