EngineeringJanuary 6, 202610 min read

Agents vs. workflows: pick the boring one

Most of what gets called an agent is a workflow with a worse debugger. Here is the decision rule we use, where each one earns its keep, and the hybrid we ship most often.

Every project starts with a temptation: build an agent. Let the model plan, call tools, decide when it is done. It feels modern. It feels powerful. It looks great on a slide. And in most cases, for most operational problems, it is the wrong tool.

We build production AI systems for a living, and the single biggest determinant of whether a project ships on time and keeps running afterwards is the choice between two architectures: the agent and the workflow. Get this right early and the rest of the project is engineering. Get it wrong and the rest of the project is reverse-engineering your own model traces at 11pm.

What we mean by each

A workflow is a sequence of named steps that you wrote down before you started. Each step has an input, an output, a known set of tools it can use, and a clear contract with the next step. Some of those steps may call a model. Most of them do not need to. The control flow is yours.

An agent is a system where the model holds the control flow. You give it a goal, a set of tools, and a budget. It decides which tool to call, in what order, and when it is done. The model is the orchestrator. You wrote the scaffolding; the model wrote the plan.

Both can be built with the same SDK. Both can call the same tools. The difference is not technical - it is a question of who makes the decisions at runtime.

The decision rule we use

We use a simple rule. If you can write down the steps before you start - even messily, even with branches - it is a workflow. If the steps depend on what the model finds along the way, in a way you genuinely cannot enumerate ahead of time, you might need an agent.

The word genuinely is doing a lot of work in that sentence. Most teams think their problem requires an agent because the steps are hard to write down. Hard to write down is not the same as impossible to write down. Spend the afternoon writing them down. You will discover that what you have is a workflow with a few branches, and a workflow with a few branches is much easier to ship, debug, and operate than an agent.

Why workflows win on debuggability

The reason this rule matters is debuggability. A workflow has named stages. When it breaks, you know which stage broke. You can look at that stage in isolation, see its inputs, replay them, and fix the bug. You can write tests for that stage. You can monitor that stage. You can swap out the model on that stage without touching the others.

An agent has a trace. When it breaks, you read 300 lines of tool calls and try to reconstruct what the model was thinking. Sometimes the trace is illuminating. Often it is not - the model made one bad call early, every subsequent call was reasonable given that bad call, and the failure looks like a cascade. You cannot easily test it because the inputs to step seven depend on what the model decided at step three. You cannot easily monitor it because the steps are different every run.

On a six-month operational system, the difference between these two debugging stories is the difference between a team that goes home at 6pm and a team that does not.

Where agents earn their keep

Agents are not always wrong. They are the right tool for a narrow band of problems where the search space is open enough that pre-writing the steps would either miss cases or balloon into a state machine nobody can read. The clearest examples we have shipped or seen ship cleanly:

Open-ended research. Pulling together information from sources you did not know about in advance. The model genuinely needs to look at what it finds and decide where to look next.
Code editing across an unknown repo. Cursor, Claude Code, and similar tools live here. The structure of the codebase is not knowable up front. You want the model to look around before it acts.
Customer conversations where the next step is genuinely contingent. Real triage, where the path forks based on what the customer says, and the fork tree is large enough that hand-coding it would be a worse experience than letting the model handle it.
Long-horizon multi-tool tasks where order does not matter. Cases where steps can be retried, parallelized, or skipped, and you do not have strong opinions about how the model gets there.

These are real. They are also the minority of business operations problems. Most of what gets shipped under the banner “AI agent” in a B2B SaaS context is a workflow.

The hybrid we ship most often

Most of what we ship is neither pure workflow nor pure agent. It is a workflow with one or two model-driven branches. The structure is hand-written: stages, inputs, outputs, retries, observability, all defined in code. The judgment calls - the steps that genuinely require open-ended reasoning - are delegated to a model call inside a stage.

A typical example: a contract review pipeline. Stage one ingests the document. Stage two extracts known fields with a deterministic parser. Stage three asks a model to flag clauses that look unusual relative to a reference set. Stage four routes flagged clauses to a human reviewer. The model is one stage in a five-stage workflow, not the orchestrator of the whole thing. It does the part it is good at - judgment under ambiguity - and nothing else.

This pattern looks boring on a slide. It runs for months without intervention. It survives team handoffs. It is cheaper to debug, cheaper to monitor, and cheaper to scale. It is what we mean when we say pick the boring one.

The cost difference, concretely

A pure agent doing the same task as a workflow with one model-driven branch will, in our experience, spend somewhere between 3x and 10x more on tokens. The agent has to reason about what to do at every step. It loads the same context repeatedly. It often retries when it gets confused. The workflow loads context once, calls the model once for the decision that needs a model, and moves on.

Latency follows the same curve. A workflow with a single model call has a single model latency. An agent has the sum of all its model calls, plus the planning overhead between them. For a user-facing surface, the difference is the difference between a product that feels fast and a product that feels broken. We dig into this further in our note on the hidden cost of always-on LLM calls.

Common mistakes

Three failure patterns we see often:

Building an agent because the framework offered one. A library's default architecture is not a recommendation for your use case. It is a recommendation for the demo the library wants to show.
Promoting a workflow to an agent to handle one edge case. If the workflow is right for 95% of inputs, do not throw the architecture away to handle the 5%. Add a branch. Add a fallback. Keep the structure.
Calling something an agent because it uses tools. Tool use is not the same as agentic orchestration. A workflow can call tools deterministically. What makes it an agent is that the model decides which tools and in what order.

A decision framework you can run in a meeting

Sit down with whoever is going to own the system. Answer these questions out loud:

Can you draw the happy path on a whiteboard in under five minutes? If yes, it is a workflow.
Can you list the edge cases? If you can list them, you can branch the workflow. Still a workflow.
Are there genuinely unknown sub-tasks the system might discover at runtime that you cannot enumerate? If yes, you might need an agent for those sub-tasks specifically - not for the whole system.
Will this system be debugged by someone who did not build it? If yes, bias hard toward workflow. The cost of unfamiliar agent traces is enormous.
Does each individual run need to complete in seconds, not minutes? Workflow. Agents are slower, almost always.

Five minutes of this conversation will save you a quarter of engineering work.

When in doubt, ship the workflow first

The good news about workflows is that they are easy to upgrade. If a hand-written workflow is not handling enough of your inputs, you can swap one stage for an agent. The rest of the system stays. You have observability for every other stage. The agent lives inside a stage with clear contracts on either side.

The bad news about agents is that they are hard to downgrade. Once a system is built around the model holding control flow, peeling that back means rewriting the scaffolding. We have done both kinds of migration. The workflow-to-agent path is a Tuesday. The agent-to-workflow path is a quarter.

Pick the boring one. You can always upgrade later. You almost never need to. And the project that ships in eight weeks instead of six months is the one that gets to keep being built.

Related reading: why most AI pilots stall before production and the hidden cost of always-on LLM calls that surprises teams running their first agentic system in production.

Contact us.