StrategyJune 7, 20264 min read

Your AI metrics are lying to you

By Jack Schaefer

Most teams measure AI by how much they use it. That tells you what you spent. It does not tell you what you got.

Most enterprises tracking AI adoption right now are measuring the wrong thing. Not slightly off. Wrong in a way that rewards inefficiency and calls it progress.

The scorecard is usually some version of the same numbers: total tokens consumed, number of chats opened, activity graphs that trend up and to the right. On paper it looks like momentum. In practice it is often just noise, because volume is a vanity metric. It tells you that something is happening. It does not tell you whether anything is actually changing.

The cobra problem

During British rule in India, the colonial government wanted to reduce cobras in Delhi. They put a bounty on dead snakes. People started collecting the reward. At first it worked. Then someone figured out a better approach: breed cobras, kill them, collect the bounty. When the program ended and the government stopped paying, breeders released their now-worthless inventory into the wild. The cobra population went up. The metric had worked perfectly. The outcome was the opposite of the intent.

This is exactly what happens when you measure AI success by tokens. You do not get better work. You get work optimized to inflate the number. Custom dashboards nobody opens. The same bot rebuilt independently by four different teams. Thousands of lines of AI-generated code that no engineer will touch because no engineer wrote it. Presentations rewritten until they say less than the first draft. The usage graph trends up. The work does not get faster.

Lines of code tried this first

Engineering orgs went through the same thing when some of them measured developer productivity by lines of code written. Engineers responded by writing verbose, bloated implementations and avoiding deletions that would make their numbers look worse. The metric improved. The software got harder to maintain, slower, and more fragile.

Tokens are the same trap. A token is what you paid, not what you got. You can burn millions of them on work that changes nothing, and every graph in your dashboard will trend beautifully upward the whole time.

What capacity actually looks like

Angelone Homes ran a lean team of three. The work was good. The question was scale. To grow, they were looking at longer timelines, a headcount hire, or saying no to new projects.

We built software around how they actually operated: an AI front desk for inbound triage, automated subcontractor compliance tracking, and a bid analysis system that did the reading before a project manager opened the email. Nothing about the team changed. What changed was what their time was worth. The portfolio grew. The founder stopped disappearing into triage. No new headcount.

That is not “more AI activity.” That is a real operating change. The difference between those two things is the difference between a vanity metric and a metric that tells you something.

The diagnostic

Three questions worth asking:

If we stopped tracking token usage tomorrow, how would we know if productivity actually went up?
What specific high-value work is being done now that was impossible six months ago?
Can we point to a specific outcome that AI accelerated, not just a higher volume of activity?

If these are hard to answer with concrete numbers, you are not measuring AI as a capacity multiplier yet. You are measuring it as a cost center with better branding.

What the measurement actually requires

Getting from usage to outcomes is not a reporting problem. It is an architecture decision. The systems that do it well trace a line from prompt to workflow to business result. Not just tokens used, but tasks completed, processes shortened, time recovered. That is a different kind of instrumentation than what most teams have built.

For leadership, it means seeing where specific workflows have increased effective capacity, and where AI is generating activity that looks identical on the usage graph but does not show up on the income statement.

Vanity metrics made sense in the first phase of experimentation. They proved something new was happening. That phase is over. Leaders are now being asked a harder question: what did it actually do for the business.

The companies that answer it clearly will not be the ones with the biggest token bills. They will be the ones that can point to a simple story: AI investment turned into capacity, and capacity turned into growth.

Everything else is just a lot of cobra breeding.

Book time.