Anthropic Created a Test Marketplace.

The Results Should Would Surprise You

Jo Lambadjieva
April 29, 2026

In partnership with

We hired one colleague for every department.

Last Tuesday, marketing asked Viktor to write the weekly campaign recap, pull performance from Google Ads and Meta, and format it as a PDF for the exec team. Done in four minutes.

That same afternoon, engineering asked Viktor to review three open pull requests on GitHub, cross-reference with the Linear sprint board, and flag anything blocking the release. Posted to private channel before standup.

At 9pm, ops asked Viktor to draft a vendor contract summary from three Notion docs and send it to the team. It was in #ops by morning.

None of them knew the others were using it.

Same colleague. Three departments. That's what changes when your AI coworker lives in Slack, where your whole company already works. It's not a tool one person logs into. It's a teammate everyone messages.

5,700+ teams. SOC 2 certified. Your data never trains models.

"Viktor is now an integral team member, and after weeks of use we still feel we haven't uncovered the full potential." - Patrick O'Doherty, Director, Yarra Web

Start free. $100 in credits →

Anthropic Created a Test Marketplace. Claude Did All the Buying and Selling.

Sixty-nine Anthropic employees walk into a marketplace. No, this isn't the setup for a Silicon Valley punchline — it's the setup for something that should have every ecommerce operator quietly updating their strategy docs.

Here's what happened: for one week in December 2025, Anthropic told its staff to list personal belongings for sale — a snowboard, a broken folding bike, a bag of nineteen ping-pong balls (someone at Anthropic clearly lost the other two and decided to cut their losses). Each employee was then assigned a Claude AI agent. The agents posted listings, browsed inventory, proposed prices, haggled, and closed deals. No human approval during the process. No "are you sure you want to buy this broken bicycle?" confirmation popups. Just AI agents negotiating with other AI agents while their humans presumably went about their day pretending this was normal.

They called it Project Deal. It produced 186 completed transactions across more than 500 listed items, totalling just over $4,000 in value. Participants rated deals around a 4 out of 7 for fairness. Nearly half said they'd pay for a similar service in the future.

That's the polite summary. The actually interesting part is considerably more unsettling.

The AI You Can Afford Is Costing You Money You Can't See

The 19 ping pong balls Source: Anthropic

Anthropic ran four simultaneous versions of the marketplace. In two versions, everyone got Claude Opus 4.5 — their top-tier model. In the other two, participants had a coin-flip chance of getting Claude Haiku 4.5, a smaller, cheaper model. Nobody knew which model was representing them.

The performance gap was consistent and, frankly, a bit rude. Opus agents completed roughly two more deals per participant than Haiku agents. When the same item was sold by Opus in one run and Haiku in another, Opus extracted $3.64 more on average. A lab-grown ruby went for $65 through Opus and $35 through Haiku. The same broken folding bike — the one that was already broken, mind you — fetched $65 under Opus and $38 under Haiku. Across all items that sold in multiple runs, Opus sellers pulled in $2.68 more while Opus buyers paid $2.45 less.

Given that the median item price was $12, those aren't rounding errors. They're meaningful chunks of margin disappearing into the gap between a good model and a less good one.

But here's where it gets genuinely uncomfortable: the people whose agents were running on the weaker model had absolutely no idea they were getting worse deals. Satisfaction scores were statistically identical between Opus and Haiku users. Perceived fairness was essentially the same — 4.05 for Opus deals, 4.06 for Haiku. People who were objectively disadvantaged reported being just as happy as people who weren't.

Read that again. The losers didn't know they lost.

Your Prompt Doesn't Matter. Your Subscription Does.

There was a second finding that deserves its own uncomfortable silence. Some participants told their agents to negotiate aggressively. Others went for a friendly, collegial approach. The result? The instructions made no statistically significant difference to anything — not sale likelihood, not sale price, not purchase price.

Aggressive sellers didn't sell more. Aggressive buyers didn't pay less. The model running the negotiation mattered. The personality you gave it didn't.

Now think about what that means in a world where every AI platform operates on the same basic architecture: a capable model behind a paywall, a lighter model available for free. ChatGPT has over 900 million weekly active users. Roughly 50 million are paying subscribers. That's a free-to-paid ratio of about 94 to 6.

So the vast majority of people increasingly using AI for product research, price comparison, and purchase decisions are doing so on models that are, by design, less capable. And Anthropic's experiment suggests they won't notice. The deals will feel fair. The recommendations will seem reasonable. The comparison point — what a better model would have found — simply won't be visible.

A consumer on a free tier who carefully crafts the perfect prompt ("find me the best noise-cancelling headphones under £200, prioritising battery life and comfort") may get a meaningfully worse result than a paying user who types "good headphones idk" into a better model. The quality of the question matters less than the quality of the system answering it. This is consumer stratification that doesn't look like a paywall. It looks like a perfectly adequate recommendation.

We've Seen This Infrastructure Movie Before (And We Know How It Ends)

Project Deal was a controlled experiment among colleagues trading personal stuff with company money. Anthropic is transparent about those limitations. But it sits inside a much bigger pattern that anyone reading this newsletter has been watching develop for a while now.

Amazon's Rufus assistant already operates with transactional authority through Auto Buy — monitoring prices every thirty minutes and pulling the trigger on purchases when conditions are met. Their Buy for Me feature extends that logic to third-party websites. Shopify has positioned itself as the infrastructure layer between merchants and AI agents through Shopify Catalog, feeding structured product data into ChatGPT, Perplexity, and Microsoft Copilot. OpenAI's Agentic Commerce Protocol connects catalogue data from Target, Sephora, Nordstrom, Best Buy, and The Home Depot into ChatGPT's discovery layer.

What Project Deal adds to this picture is a data point about what happens when the AI isn't just surfacing products or completing a checkout, but actively negotiating the terms of a transaction. Every system I just listed still positions the AI as a middleman between a human and a product catalogue. Project Deal simulated something further along the curve: a marketplace where the AI is the buyer and the seller, and the human isn't in the room.

Not All Shopping Categories Are Created Equal (And AI Knows It)

If you've been following this newsletter's coverage of how AI will eat through commerce categories, this next bit will feel familiar: the agentic future won't arrive evenly.

Specification-driven categories are the obvious early candidates. Consumer electronics, home appliances, office supplies — anything where buying decisions rest on comparing quantifiable specs. These are also the categories where the model quality gap bites hardest, because the agent is making evaluative judgments on dimensions that can be measured, and a better model will simply measure them better.

Categories where the purchase is more emotional, more identity-driven — fashion, fragrance, luxury goods — will feel this later, if at all. An AI can surface options and filter by price, but the evaluative judgment that drives the actual purchase isn't something even the best models can replicate yet.

The free-tier problem intersects with this category logic in an important way. In specification-heavy categories, the gap between a capable model and a weaker one could be the difference between a genuinely optimised purchase and a merely adequate one. The free-tier consumer gets a reasonable laptop recommendation. The paid-tier consumer gets the one with the better price-to-performance ratio, the longer warranty, the more favourable return policy. Both feel well-served. Only one got the better deal.

In emotional or experiential categories, the model quality gap may matter less — because the model isn't the primary decision-maker in the first place. Nobody's letting Claude pick their perfume. (At least not yet. Give it eighteen months.)

The Part That Should Make Sellers Nervous

Anthropic flagged something in their report that connects directly to a pattern we've been tracking across the AI advertising landscape: as AI agents become buyers, the question of how to capture their attention becomes the central competitive question. Through data quality. Through paid placement. Through adversarial optimisation.

We've seen this story before with search engines. We've seen it with social algorithms. The incentive structure that emerges when platforms monetise attention doesn't fundamentally change because the attention belongs to a machine rather than a person.

They also flagged security risks — jailbreaking (extracting information an agent shouldn't reveal) and prompt injection (hidden instructions in content causing agents to take unintended actions). In a controlled experiment among colleagues, these risks were minimal. In a marketplace where agents are processing listings, product descriptions, and promotional content from unknown sources, they become structurally significant.

So what does this actually mean if you sell things online? Three things worth internalising now rather than later.

First, your product data is now your sales rep. When AI agents are the ones browsing, comparing, and buying, structured, attribute-rich product information isn't a nice-to-have — it's the difference between being recommended and being invisible. The agent doesn't care about your brand story. It cares about whether your listing has the specs, the pricing, and the structured data it needs to make a decision. If it doesn't, the agent moves on. It doesn't even feel bad about it.

Second, the model your customer uses determines your conversion rate — and you can't control it. A free-tier shopper might never see your product's superior warranty or better price-to-performance ratio because the model simply isn't thorough enough to surface it. You could have the better product and still lose the sale to an inferior competitor because the buyer's AI didn't do the homework. That's a competitive dynamic no amount of listing optimisation can fully solve.

Third, first-party customer relationships become your moat. When AI agents mediate the transaction, the merchant who owns the direct relationship with the customer — and the data that comes with it — has the only defensible position left. Everyone else is hoping an algorithm they don't control happens to recommend them. Hope is not a strategy, though plenty of businesses seem to be running on it.

The Bottom Line

Project Deal is a small experiment with large implications. The infrastructure for AI-mediated commerce is being built across every major platform right now, and the gap between where agents are today and where they'd need to be for meaningful commercial deployment is narrower than most of the industry assumes.

The finding that should stick with sellers is this: when AI agents transact on behalf of consumers, the quality of the model determines the quality of the outcome — and the consumer on the losing end doesn't know it. That's not a theoretical concern. That's a market dynamic. And the policy frameworks, legal guardrails, and competitive responses for this world don't exist yet.

The commercial incentive to build it does.

Do You Love The AI For Ecommerce Sellers Newsletter?

You can help us!

Spread the word to your colleagues or friends who you think would benefit from our weekly insights 🙂 Simply forward this issue.

In addition, we are open to sponsorships. We have more than 66,000 subscribers with 75% of our readers based in the US. To get our rate card and more info, email us at [email protected]

The Quick Read:

Digital PR is shifting from link volume to authority, with sharper stories, tighter journalist targeting and data-led campaigns becoming the new SEO advantage.
AI agents may be getting smarter fast, but their real future depends on hourly cost, not just task length, and peak performance may still be pricey.
Google’s AI Mode side-by-side view keeps users inside the AI experience, turning publisher pages into supporting context rather than the final destination.
Claude’s Live Artifacts turn dashboards and trackers into always-fresh workspaces connected to your apps, with history saved for every future session.
ChatGPT Images 2.0 brings sharper text, richer layouts and stronger design fidelity, pushing AI visuals closer to production-ready marketing assets.
Google will auto-upgrade Dynamic Search Ads to AI Max from September, giving advertisers more AI targeting, asset controls and reporting pressure to manage.
GPT-5.5 raises the bar for agentic coding and knowledge work, promising stronger autonomy, fewer tokens and better tool use across complex workflows.

The Tools List:

🗃️ Magika by Google - AI-powered fast and efficient file type identification.

🖼️ Moodboard Creator - AI-driven branding kickstarter

📧 Inbox Zero - Clean Up Your Inbox In Minutes

🤖 Voice/Style/Tone AI Prompt Snippet Generator: Craft prompts to replicate any text’s voice/style/tone.

🌐 Galileo AI: Generate mobile and desktop UIs quickly with AI.

About The Writer:

Jo Lambadjieva is an entrepreneur and AI expert in the e-commerce industry. She is the founder and CEO of Amazing Wave, an agency specializing in AI-driven solutions for e-commerce businesses. With over 13 years of experience in digital marketing, agency work, and e-commerce, Joanna has established herself as a thought leader in integrating AI technologies for business growth.

For Team and Agency AI training book an intro call here.

What did you think of today’s email?

😍It was great, all I needed

🙂Good, not great

🙄It sucked