Four AIs, One Question — The Data Jester

A few weeks ago, on a podcast I co-host, I said something I've said in some version a hundred times before: AI doesn't really know the difference between a furniture shopper in Cleveland and one in Charlotte. It was a throwaway line. The kind of thing you say to make a point about why local market intelligence still matters.

A listener emailed me to push back. Politely. He'd run the question through an AI engine himself and got a thoughtful, detailed, market-specific answer. He sent me the response and asked, fairly: are you sure?

I was not sure. So I ran an experiment.

I asked the same question — how are Cleveland furniture shoppers different from Charlotte furniture shoppers? — to four different AI engines: Claude, ChatGPT, Perplexity, and Gemini. I broke the answers into eleven factors, lined them up side by side, and went looking for evidence that AI was, as I'd claimed, unreliable for this kind of work.

What I expected was disagreement. Four engines, four different reads, lots of contradictions I could point at and say see? That's the easy version of being right. That's the version where I get to keep my podcast line, add a new chart to my next deck, and quietly congratulate myself for being a careful thinker.

Reader, I did not get to do that.

Of the eleven factors I asked about, seven came back unanimous. All four engines agreed. On three more, three engines agreed and one was a slight outlier. Only one factor produced a real split — and we'll come back to that one, because it's the most interesting row in the whole table.

Here's a sample of what came back:

Factor	Cleveland	Charlotte
Income / affluence	Lower median income. Working/middle class. Tighter spending capacity.	Higher median income. Banking, finance, tech employment. More discretionary spend.
Style preference	Traditional, transitional, durable. Comfort over trend.	Modern, coordinated, "Instagrammable." Trend-aware.
Purchase trigger	Replacement-driven. Worn-out items, dead mattress, refresh.	Life-stage. New home, growing family, room setup.
Financing receptivity	The engines split here.

Look at the first three rows. That is a tight, consistent, recognizable read of two markets. Cleveland is the lower-income, older-housing, traditional-taste, replacement-driven, value-focused market. Charlotte is the higher-income, newer-housing, modern-taste, lifestyle-driven, aspirational market. Four independent engines. Same picture.

So I was wrong on the podcast. AI absolutely can articulate the difference between Cleveland and Charlotte furniture shoppers. The answer it gives isn't gibberish. It isn't even bad. It's directionally correct, internally consistent, and — for a person trying to get oriented to two markets they don't know — genuinely useful.

That's the part the listener was right about.

Here's the part that started bothering me.

The engines didn't agree because each of them independently understood these markets. They agreed because they were all pattern-matching the same public Census data through the same generic furniture-industry playbook. Lower income plus older housing equals "value-driven replacement buyer." Higher income plus newer housing equals "aspirational lifestyle buyer." Anyone with twenty minutes and Census Reporter could produce the same read. It's a Census summary in a confident voice.

And the tell is in that one row I asked you to remember.

On financing receptivity in Cleveland, two engines said high — lower-income market, more reliance on payment plans and rent-to-own, makes sense. Two engines said low — lower-income market, more debt-cautious, makes sense. Both reads are defensible. Neither is informed. The question requires actual behavioral data — credit product use, BNPL penetration, rent-to-own footprint by DMA — and that data isn't sitting in the demographic dataset the engines pattern-matched on for the other ten rows.

The moment the question required real signal instead of a demographic sketch, the engines split. And here's the thing that made me sit up. They split with the same confidence they had used on the grounded rows. No hedging. No "we're less sure here." Just two confident answers in opposite directions, delivered in the same authoritative voice as the rows that were actually solid.

That's the failure mode I was trying to point at on the podcast, and I described it badly. AI is not bad at understanding local markets. AI is excellent at the demographic-archetype layer of local market analysis, and unreliable below it — and presents both in the same tone.

Convergent confidence on shallow data is more dangerous than visible disagreement, because it doesn't trigger your skepticism. When four AIs agree, the answer feels validated. When they diverge, you go look for a tiebreaker. The first scenario is the one that should worry you more. It's the one where you're most likely to spend real money on a market read that's really just a Census summary in a confident voice.

So the listener was right that I overstated my podcast line. And he was also right that AI is more capable here than I'd given it credit for.

But the experiment landed somewhere I didn't expect. Not on AI is bad at this. On something more uncomfortable: AI is a confident generalist, and confidence plus shallowness is the dangerous combination.

Which means the job hasn't gone away. It's just changed shape. The work isn't producing the demographic sketch anymore — four engines will hand you that for free. The work is knowing which rows of the table are real signal and which ones are pattern-matching wearing a confident voice. Knowing where the data actually thins out. Knowing when the answer that sounds right is the answer to a different question.

Somebody still has to do that.

The AI is not going to tell you when it stops knowing what it's talking about.