Every payment dispute, in theory, turns on evidence. A cardholder says the goods were damaged. A merchant says they were delivered in perfect condition. Somewhere between those two claims sits what actually happened — and evidence is supposed to be the mechanism that gets us there. Supposedly!

The Evidence Problem: Why Dispute Evidence Is Both Overloaded and Undervalued

In practice, the evidence layer of the dispute process is in serious trouble. It is simultaneously flooded with material that cannot meaningfully be assessed, hollowed out by the ease with which that material can now be fabricated, and structurally blind to the transaction contexts that are becoming most common. None of these are edge cases. Together, they point to an evidence framework that has not kept pace with the commerce it is supposed to adjudicate.

The Inbox Problem: Evidence Without Assessment

When a cardholder raises a goods or services dispute, issuers typically ask them to provide any evidence supporting their claim. In principle, this is the right instinct. In practice, the instruction amounts to "send us anything" — and cardholders do exactly that.

The result is dispute inboxes receiving photographs, email threads, screenshots of conversations, and scans of documents — often in no particular order, with no consistent format, and with no guidance on what the issuer actually needs to make a decision.

The numbers here are not theoretical. For goods and services disputes, some issuers have indicated that submitted consumer evidence is meaningfully reviewed in only around 10% of cases — with more substantive assessment reserved for higher-value transactions where detailed review is more easily justified. For smaller disputes, the volume and inconsistency of submissions makes case-by-case review functionally impractical.

This is worth pausing on. The evidence submission step — the step that is supposed to give the dispute process factual grounding — is, for a significant proportion of cases, not functioning as evidence review at all. It is functioning as a formality. Consumers absorb the effort of preparing submissions that are not assessed, and the outcome is determined by other means.

The Sensitivity Problem: What Is Actually Being Sent

Vague requests for "any documentation" produce not just high volumes but genuinely risky ones. A cardholder photographing a damaged item may capture other documents in the background. Someone compiling proof of a conversation may include screenshots containing account details. Identity documents — passports, cards, driver licences — are submitted without being asked for, and without any awareness of where they are going.

The more immediate concern is the channel. Many issuers, particularly smaller banks and credit unions, still receive dispute evidence via plain email. Sensitive personal information — including images that may contain identity documents — is being transmitted over a channel with no end-to-end encryption guarantee, no access controls, and no systematic governance over retention or deletion. That material sits in email systems and inboxes of unknown security posture, potentially indefinitely. The privacy exposure is not hypothetical. It is a routine feature of how dispute evidence is currently collected.

The Fabrication Problem: What Evidence Is Worth in 2026

The evidence problem would be serious enough if the issue were only volume and governance. But there is a third dimension the dispute industry has been slow to reckon with directly: the declining evidentiary value of the materials most commonly submitted.

Photographic evidence of damaged goods has historically been treated as meaningful. It is now trivially easy to edit. A photograph showing an item in a condition it was never in can be produced in minutes using commercially available tools, with no technical expertise required. The same applies to receipt data — order confirmations and payment summaries can be generated or modified with comparable ease.

Conversation logs — screenshots of exchanges between a consumer and a merchant — can be fabricated entirely. There is no reliable visual marker distinguishing a genuine messaging thread from a reconstructed one.

This does not mean that evidence has no value. It means the evidentiary weight of image and document-based submissions must be recalibrated against the ease with which those submissions can be manufactured. The most commonly submitted types of evidence are precisely the types that are easiest to fabricate, and the dispute volumes that make detailed review impractical are the same volumes where fabricated evidence is most likely to go undetected.

The Cross-Institution Problem: Why Transaction History Is Not Enough

The response to mounting first-party fraud has, appropriately, included greater use of behavioral and transaction history analysis. Banks are increasingly checking dispute frequency and transaction patterns when evaluating claims.

The problem is that this analysis is bounded by the institution's own data — and consumers do not bank with one institution.

The concept of a "house bank" has been eroding for years. Consumers routinely hold accounts across multiple institutions, use different cards for different spending categories, and maintain relationships with fintechs and neobanks alongside traditional banks. For someone who wishes to exploit the dispute system, this fragmentation is structurally advantageous. Dispute history at one institution is invisible to another. A pattern that would be flagged as anomalous in aggregate is, within any single institution's view, unremarkable.

Bad actors hedge their bets accordingly. Within-institution history checks are a genuine improvement over no checks at all — but they are close to a moot point for anyone who understands how to distribute their activity across institutions. The check catches the careless, not the deliberate.

The Agentic Commerce Problem: Evidence for Transactions That Were Never Human

Looking ahead, the evidence framework faces a challenge that the current architecture is entirely unprepared for: the growth of agent-initiated transactions.

As AI agents act on behalf of consumers — making purchases, managing subscriptions, executing bookings — the authentication signals that underpin dispute analysis are increasingly difficult to interpret. Under current protocol drafts, issuers are not receiving visibility into whether a transaction involved an AI agent acting under delegated authority. There is no indicator in the settlement data that would allow an issuer to distinguish a consumer's deliberate purchase from a transaction executed autonomously by an agent the consumer may not have fully understood or authorised.

Issuers are therefore applying analysis designed for direct consumer behaviour to transactions that carry a fundamentally different authorisation structure. When a consumer disputes a transaction initiated by an agent — whether because the agent acted outside its intended parameters, because the consumer did not understand the scope of authority they granted, or because they are simply disavowing a transaction they do not remember authorising — the existing evidence framework provides no path to resolution.

This is not a distant problem. Agentic commerce is scaling now. Frameworks that do not account for agent-initiated transactions will be applied to them anyway, producing outcomes that satisfy no one.

The Questions That Need Answering

The problems above — volume without assessment, sensitive information transmitted over unsecured channels, fabrication undermining evidentiary value, cross-institution blindness, and structural invisibility of agent-initiated transactions — all reflect the same underlying condition: an evidence framework built for a world that no longer exists.

Which leaves open questions that the industry has not yet seriously engaged with.

What does meaningful evidence actually look like for a goods and services dispute in 2026, given that the most commonly submitted materials are among the easiest to fabricate? Is photographic evidence of damage still a legitimate input, and if so, under what conditions?

If the volume of goods and services disputes makes genuine evidence review impractical at scale, what is the purpose of collecting evidence at all — and who bears the cost of the pretence that it is being assessed?

As transaction records and receipts become easier to generate artificially, where does the evidentiary weight shift? What categories of data remain genuinely difficult to manipulate, and are dispute frameworks structured to give those categories appropriate weight?

When a consumer's dispute history is only visible within a single institution, what does that history actually tell an assessor — and what does it conceal?

And as agent-initiated transactions become a normal feature of commerce, what signals need to exist in the transaction record for a dispute assessor to understand what they are actually looking at?

As fabricated evidence becomes easier to produce, should banks now be checking for AI generation markers and metadata on documents and images submitted as part of a dispute? Tools that flag AI-generated or AI-edited content exist, and document metadata can sometimes reveal inconsistencies between a file's stated provenance and its actual creation history. But the viability question is real: detection tools are imperfect, metadata can be stripped, and an arms race between fabrication and detection is not obviously one the dispute process is equipped to run. Is systematic AI marker checking a meaningful line of defence — or does its limitations make it another formality that adds cost without changing outcomes?

These are not rhetorical questions. They are the questions that determine whether the evidence layer of the dispute process serves any genuine function — or whether it has become, across large parts of the volume it handles, a process without a purpose.