Liability in Agentic Commerce: Who Pays When the Agent Gets It Wrong?

The payments industry has spent a decade building the foundations agentic commerce now depends on — 3D Secure, network tokenisation, passkeys, digital wallet authentication. Those foundations are being puzzled together to enable a model they were not designed for, and they will be pushed to the edges of their functionality in the process.

Agentic commerce inherits the four-party card model — cardholder, issuer, acquirer, merchant — and adds a new actor: the AI agent. Under current scheme rules, the agent carries no financial liability, even though it now makes the purchase decision. For unauthorised-transaction disputes, tokenisation, authentication and passkeys have been the mechanisms used to shift liability to issuers; there is no final, consistent industry position on how that shift applies to agent-initiated transactions. For goods-and-services disputes — not as described, wrong item, misinterpretation of consumer intent — the merchant remains fully liable, even when the misrepresentation originated in the agent's translation of what the consumer asked for, not in anything the merchant published.

Our hypothesis is that goods-and-services disputes will increase not just in complexity but also in volume, driven by the abstraction layer the agent introduces between the consumer and the merchant - yet the four-party model will continue to wear the risk.

The first generation of agentic commerce disputes will surface these gaps. The industry should not wait for it to do so.

Why the four-party model is reaching its limit

The card dispute framework rests on a message standard — ISO 8583 — designed for a transaction with four parties: cardholder, issuer, acquirer, merchant. Liability allocation works because every actor has somewhere to be held accountable.

Agentic commerce introduces a new participant the model was never built to accommodate: the AI agent. The agent is what initiates the purchase. The platform underneath — OpenAI, Google, Microsoft Copilot, Anthropic — provides the model, but the actor inside the transaction chain is the agent itself.

The schemes are moving to bring agents into a contractual perimeter. Visa's Trusted Agent Protocol and Mastercard Agent Pay both involve onboarding agents, with verification requirements and compliance obligations that are still being defined. This is a meaningful development. But onboarding an agent is not the same as making it liable for a chargeback, and to date none of the published frameworks place the agent inside the financial liability chain.

Every published statement from the schemes and the protocol authors says the same thing: liability follows existing payment models, and the merchant remains the merchant of record.

It is worth noting that both card-present and card-not-present transactions executed through digital wallets — Apple Pay, Google Pay and similar — are consistently associated with lower fraud rates than their non-wallet equivalents. The same outcome cannot be assumed for agentic commerce. Digital wallets reduce fraud because they tighten the link between the device, the consumer and the credential. Agents loosen that link by inserting an intermediary that makes decisions on the consumer's behalf, often outside the moment of explicit consent.

The foundations agentic commerce is built on

Agentic commerce did not invent new payment rails. It is repurposing capabilities the industry has spent a decade building, and pushing each of them harder than it has been pushed before.

3D Secure 2.x, particularly the data-only sharing modes in 2.2 and 2.3, was designed to give issuers richer signals for risk-based authentication. In an agentic context, those same fields become the carrier for agent identity and intent data. Optional data elements and messages that issuers historically deprioritised — fields marked "phase two" in implementation that never happened — are now load-bearing.

Network tokenisation underpins the entire flow. Combined tokenisation-with-authentication coverage of CNP transactions remains a fraction of total volume. The road to "every agentic transaction is a tokenised, authenticated transaction" is longer than the agentic commerce timeline suggests.

Passkeys and CDCVM have been the principal mechanisms for shifting liability to issuers in digital wallet contexts. The question agentic commerce raises is whether the same liability-shift logic are good enough when the consumer authenticated on their device but the agent — not the consumer — is the one executing the transaction.

Then there is the technical debt. Anyone who lived through the digital wallet rollout will recognise the pattern: urgent "captain's call" implementations, manual workarounds for edge cases, token requestor review processes designed for low volume that were never automated. Those manual processes are about to meet agentic transaction volumes.

The protocols give better evidence. They do not redistribute liability.

Several protocols have emerged in the past eighteen months to structure agent-initiated transactions: Google's AP2, OpenAI's ACP, Mastercard Agent Pay, Visa's Trusted Agent Protocol. They differ in scope and design but converge on the same approach — cryptographically signed records of consumer intent, agent identity, and merchant fulfilment, designed to give the dispute process better evidence.

This matters. Better evidence makes disputes more resolvable. But none of the protocols call out who pays when a dispute is caused by agent error. Liability allocation under scheme rules has not moved.

Rollout has been the easy cases. The hard ones are coming.

Almost every public agentic commerce trial to date has been a deliberately simple case: single purchase, human-present, low complexity. The Mastercard-Westpac movie ticket trial. ChatGPT Instant Checkout for individual items via Stripe. These trials are useful and necessary, but they are not representative of where the disputes risk lives.

The harder use cases — the ones the protocols are ultimately being designed for — are:

Multi-merchant baskets driven by a single instruction. An agent fulfils a household shop across several merchants in one consumer instruction. If the agent misinterprets the intent — wrong brand, wrong size, wrong substitution rules — the consequence is not one bad transaction but several, scattered across multiple merchants, each becoming its own dispute. The consumer carries the entire cognitive load of reconstructing what they asked for, mapping it back to specific merchant transactions, and lodging separate disputes. They were not on any merchant's website at the time of purchase, so they have nothing to anchor the recall to. Each merchant in turn sees only its own transaction, with no visibility of the original instruction or the broader basket.
Human-not-present optimisation. The agent monitors prices, points balances, or inventory and executes a purchase when conditions are met — days or weeks after the original instruction. The consumer may have forgotten the mandate, or the conditions they set may no longer reflect what they actually want.

These are the cases where misinterpretation, mandate drift, and unfamiliar merchant identity create disputes the existing process is not equipped to handle. Rolling agentic commerce out from movie tickets to travel and household optimisation is the step where the foundations get tested.

Two dispute categories, two different problems

The agentic commerce dispute conversation has been muddled by treating "unauthorised" and "not as described" cases as one problem. They are not.

Unauthorised transactions and the "I didn't authorise that agent" question

Three sub-scenarios sit inside this category, and they behave differently.

The first is classic third-party fraud — compromised credentials used to instruct an agent to make a purchase. Existing frameworks apply. Tokenisation and 3DS work as designed.

The second is harder: the consumer does not recognise the merchant the agent transacted with. This is the merchant name enrichment problem, scaled up. When an agent buys from a marketplace or multiple merchants at once for one instruction, the consumer has no relationship with, or from a brand the consumer has never heard of because the agent did the comparison shopping, the cardholder sees a name on their statement and amounts that means nothing to them. The mental load to map purchase intent back to actual merchant transactions sits entirely with the consumer, because they were not on the merchant's site at the time of purchase and have nothing to anchor the recall to. They dispute. The merchant has done nothing wrong. The agent has done nothing wrong. The dispute proceeds anyway.

The third is the agent acting outside its mandate. The agent acting outside its mandate is the third. Consumer authorised to buy an item on sale for $200. The agent didn’t have insights into how the price was determined and the final price turns out to be $240. The agent continues with the purchase anyway. If the transaction was executed with 3DS, tokenisation and a passkey-bound authentication — as agentic transactions are expected to be — the authentication succeeded and liability sits with the issuer under the existing liability shift. Realistically, the agent made the mistake here; the consumer did not lie about their authorisation, and treating this as first-party fraud is a stretch. A scenario that could increase issuer write-offs further.

There is also a plausible route through the schemes rules for "incorrect amount" charged, which today covers cases where the transaction amount differs from what was authorised. Extending this code's scope to cover the consumer's mandate to the agent — rather than just the authorisation amount — would put the merchant in scope for the dispute. Whether the schemes choose to broaden the code is yet to be seen.

Goods and services — not as described, misinterpretation

For CNP goods-and-services disputes today, the merchant is liable regardless of authentication. 3DS does not protect against "the product was not as described." It never has.

The new wrinkle in agentic commerce is that the description the consumer received may not be the description the merchant published. The agent translates. The agent summarises. The agent picks out the features it thinks matter and presents them in conversational form.

In human-present search today, a consumer who sees a headline price without the asterisk excluding sale items or specific brands may be confused when at checkout the actual price is higher, but is unlikely to dispute — they are present at the moment of purchase and can read further if they choose. In agentic commerce, the consumer is not present in the same way. The agent has done the reading and made the decision, and the first time the consumer encounters the detail may be when the product arrives or the booking is honoured.

The travel example is the sharpest illustration. A price-optimising agent comparing hotel rates is making decisions on offers where the bundled inclusions or explicit exclusions— breakfast, parking, transfers, cancellation flexibility, room type — vary significantly between options. A summary that flattens these into "best price" is a translation decision the merchant did not control and the consumer did not see.

This is why our hypothesis is that goods-and-services disputes will rise in both complexity and volume. The abstraction layer between consumer and merchant — the agent — increases the number of points at which a translation can go wrong, and pushes the moment of consumer discovery downstream of the purchase, where dispute is the natural response.

Compelling evidence in this category needs to expand. Order data, IP, and login history are not enough. Disputes teams will increasingly need to capture:

The consumer's original prompt or instruction to the agent
The agent's interpretation of that instruction
The mandate the merchant received
The product description the agent surfaced to the consumer

None of this is in the dispute kit today.

Will the agent take on liability?

The PDI hypothesis is that it will not — not voluntarily, and not without regulatory pressure.

The pattern is recognisable. Financial advice generated by an LLM comes with a disclaimer. Health information comes with a disclaimer. Legal summaries come with a disclaimer. "AI can make mistakes. Please double-check." That sentence is doing significant legal work, and it is being carried into commerce.

One possible path the schemes could take is to route dispute notifications — through services like Ethoca or its equivalents — to bring agents into the dispute conversation in parallel with the participating merchant. This would create a feedback loop. Whether it would translate into the agent accepting financial liability is a separate question. The more likely outcome, in our view, is that agents will treat dispute notifications the way most platforms treat signals of this kind — as inputs to improve their models rather than as a basis for accepting financial responsibility for outcomes.

This matters beyond chargebacks. Agentic commerce is being layered onto a market where merchants already carry disproportionate dispute liability. Adding a new actor into the transaction chain that takes economic value but accepts no liability does not balance the system. It tilts it further.

What this means for issuers and merchants now

For issuers, the questions worth working through before agentic volumes scale:

Which optional 3DS 2.x data fields are implemented today, and which were deferred? The deferred ones are what agentic protocols will populate.
What technical debt was carried forward from the digital wallet rollout — token requestor review, manual exception handling, pass-through of provisioning data? Feeding DE48/ field 62 into fraud and workflow engines? These are the processes that break first under volume.
How will the disputes team treat agent-generated evidence — mandates, intent records, agent identifiers — when it appears in a representment? There is no established policy yet, and one will be needed.
Staff and consumer education. What happens if a consumer calls for an unrecognised agentic transaction? How does the service model need to evolve?

For merchants, the readiness pathway involves more than payment acceptance:

Catalogue readiness — agent protocols require structured product data, real-time inventory, and explicit descriptions. Buried exclusions and conditional pricing become misrepresentation risk at agent scale.
Compelling evidence strategy — capture and retain the mandate, the agent identifier, and the intent data alongside the transaction.
Terms and conditions clarity — discount exclusions, sale-item carve-outs, and brand-specific terms that already cause friction in search become a larger problem when an agent is the one reading or inferring them.