Why Grinding One Model Is the Wrong Abstraction

The founding essay of the Musketeer philosophy

This is not a theoretical argument. It is a report from the field. After months of daily AI work across multiple projects, certain patterns became undeniable. This essay names them.

The setup

You have a project. It needs code, decisions, and verification. You open your favorite model. You start a conversation. You explain the context. You ask for code. You get code. You refine it. You ask for more. The conversation grows.

At first, this works. The model understands your project. It remembers your constraints. It produces useful output. You feel productive.

Then something shifts.

The drift

Around message twenty, the model starts to forget. Not catastrophically. Subtly. A constraint you stated at the beginning gets ignored. A naming convention you established gets violated. The model suggests something you explicitly ruled out.

You remind it. It apologizes. It corrects. The correction lasts three messages.

You start repeating yourself. You paste the same constraint multiple times. You bold important words. You feel like you are managing the model more than working with it.

The compaction

This is not the model being bad. This is the model doing what models must do: compacting context. As the conversation grows, earlier messages must be compressed to fit newer ones. The compression is lossy. Nuance disappears. Constraints fade.

The model with a 128k context window is not a model that remembers 128k tokens perfectly. It is a model that fits 128k tokens into its attention mechanism, with decreasing fidelity as distance increases.

You are paying for tokens that are being forgotten.

The wrong tool for the job

Here is the insight that took months to crystallize: when you ask a conversational model to generate production code, you are asking it to be something it is not.

Conversational models are optimized for dialogue. They are good at exploring ideas, refining concepts, and maintaining the thread of a discussion. They are not optimized for bounded execution. They are not optimized for following precise instructions. They are not optimized for producing artifacts.

Executor models are optimized for bounded execution. Claude Code, Codex, and similar tools are built to receive instructions and produce output. They are not optimized for long-running conversation. They are not optimized for exploration.

When you use one model for both, you get the worst of both. The conversational model produces code that drifts from your intent. The executor model, if you try to use it for conversation, feels stilted and narrow.

The revelation of three

The pattern that emerged from practice was not two roles, but three.

First: the conversational role. Holding context, exploring options, forming intent. This is what ChatGPT does well. This is what Claude in conversation mode does well.

Second: the execution role. Receiving bounded instructions, producing artifacts, following constraints precisely. This is what Claude Code does well. This is what Codex does well.

Third: the verification role. Looking at output, comparing it to intent, reporting discrepancies. This is what any model can do when prompted correctly, but it works best when the verifier has no stake in the output.

Three roles. Three kinds of thinking. Three models, or three modes of a model.

The handoff

The insight was not just about roles. It was about handoffs.

When you hand off from Originator to Executor, you must crystallize intent. You must extract from a sprawling conversation the precise instruction that matters. This is not easy. It is work. But it is valuable work.

The handoff forces clarity. You cannot hand off ambiguity and expect precision. You must decide what you want before you can ask for it.

When the handoff is good, the execution is good. When the handoff is sloppy, the execution is sloppy. The quality of the handoff predicts the quality of the result.

The cost question

This is not only about quality. It is about cost.

Conversational tokens are cheap. Exploration tokens are cheap. You can think with a model for a long time without spending much.

Execution tokens are expensive. Agentic models charge more. Code generation costs more. When you mix conversation and execution in a single model, you pay execution prices for conversation.

When you separate roles, you naturally route work to the appropriate price tier. Exploration happens in cheap tokens. Execution happens in expensive tokens, but only when execution is needed. Verification happens in cheap tokens again.

The cost savings are not marginal. They are substantial.

The cognitive question

This is also not only about cost. It is about cognitive load.

When a single model drifts, you compensate. You hold more context in your head. You verify more carefully. You double-check everything. You end the day exhausted, not because the work was hard, but because you were fighting your tools.

When each model does what it is good at, you stop compensating. You trust the Originator to remember. You trust the Executor to follow instructions. You trust the Cross-Examiner to catch problems.

The cognitive load shifts from you to the system.

The practice

In practice, this looks like multiple windows. One window has your Originator conversation. You have been talking for an hour. The context is rich.

When you are ready to execute, you do not ask that model to write code. You write a handoff. You extract the intent, the constraints, the acceptance criteria. You pass the handoff to your Executor in another window.

The Executor receives a clean instruction. It does not know about your hour of exploration. It does not need to. It produces output.

If the stakes are high, you pass the output to your Cross-Examiner. The Cross-Examiner reads the original intent and the produced output. It reports discrepancies. It does not fix them; it reports them.

You decide what to do with the report. You remain in control.

The feeling

After adopting this practice, something changed that was hard to name at first. It was a reduction in friction. The tools stopped fighting back. Each model did what it was supposed to do. The outputs matched the expectations.

There was also an increase in trust. Not blind trust; earned trust. Each model proved itself in its role. The Originator really did hold context well. The Executor really did follow instructions. The Cross-Examiner really did catch problems.

The anxiety of drift faded. The exhaustion of compensation lifted. The work became the work, not the management of tools.

The name

This way of working needed a name. Not for marketing. For thinking.

Musketeer captures the trio. Three musketeers. Three roles. All for one, one for all. Each role supports the others. None operates alone.

The name is not precious. What matters is the practice.

The claim

The claim is modest: if you work with AI daily, on serious projects, for real output, you will eventually discover something like this. You may not call it Musketeer. You may not have three explicit roles. But you will stop grinding one model for everything. You will start separating cognitive work.

This essay is not trying to sell you a tool. It is trying to name a practice that you may already be reaching for. If the description fits your experience, good. If it does not, that is also fine. The practice emerged from one person's work. It may not fit everyone's.

But if you have felt the drift, the compaction, the cognitive load, and the cost, and you have wondered if there is a better way, there is.

Separate the roles. Make the handoffs explicit. Let each model do what it is good at.

That is the whole thing.

Back to Essays