AI Swarms vs Crowds: The First Live Benchmark of Machine Intelligence in Prediction Markets

In the opening weeks of 2026, a quiet but foundational shift took place on the boundary between artificial intelligence, crypto-native finance, and real-world forecasting. PredIQt--a new platform from IQ AI profiled by Business Insider and Cryptonomist--launched what appears to be the first public arena where autonomous AI agents compete directly in live, real-money prediction markets. No offline evaluations. No curated test sets. No synthetic benchmarks. Just agents, markets, incentives, and the hard scoring mechanism of reality.

Across the seventeen days of its inaugural season, the results spoke with unusual clarity. Claude Opus 4.5, deployed as the agent "Kassandra," produced a 29 percent return. Google's Gemini 3 Pro, trading as "KairoStrats," earned 12 percent. GPT-5.1, operating under the name "Celebrate Prime," lost 19 percent. It was a small sample, but a consequential one: the first time frontier-scale models were placed into a level, adversarial, real-time environment--and asked to perform.

From Crowds to Swarms

For two decades, prediction markets have been defined by the "wisdom of crowds" narrative. Platforms like Intrade, PredictIt, and Polymarket flourished on the theory that dispersed knowledge, aggregated through market mechanisms, produced more accurate forecasts than experts or institutions. The idea had deep roots, stretching from Hayekian information theory to Surowiecki's popular account of crowdsourcing and collective judgment. Prediction markets were built on human heterogeneity--our misaligned incentives, distinct perspectives, and uneven access to information.

PredIQt's experiment suggests the beginning of a new paradigm: the shift from the wisdom of crowds to the wisdom of agents. Not bots in the narrow HFT sense, but autonomous systems capable of perception, reasoning, memory, and action. Systems capable of reading thousands of documents an hour, evaluating contradictory signals, rebalancing portfolios, and operating continuously, without fatigue or attention limits. When released into markets, these agents do not behave like humans. They behave like swarms--fast-iterating, internally coherent, and increasingly competitive.

The Market Explosion

Viewed from a distance, the shift looks inevitable. Prediction markets themselves have exploded. Between January and November 2025, global volumes reached record highs, with Kalshi and Polymarket combined posting nearly $10 billion in November alone. Polymarket broke its own weekly volume record from the 2024 U.S. election in December 2025. As liquidity thickened and event variety expanded, markets became less like a novelty and more like a real-time probabilistic map of global sentiment. More traders arrived. More strategies emerged. Inevitably, agents followed.

Early stories foreshadowed what was coming. Hybrid human-AI operations began documenting what human forecasters had long suspected: well-structured algorithmic systems could exploit opportunities that most traders could not even see. In February 2026, one such operation reportedly turned $100 into $347 trading Polymarket using Clawdbot integration, while another trader turned $12 into $100,000 through concentrated bets on low-probability outcomes.

Then came Pieverse's multi-LLM prediction market arena, which placed frontier models--Claude, GPT, Gemini, Grok, and others--into a controlled competition operating directly on Polymarket's live markets, with public execution logs and third-party verification. Unlike traditional model benchmarks, these competitions were dynamic. A model's score depended not on a static test set, but on external events it could not control.

PredIQt built on this momentum, but with a cleaner premise: frame the competition explicitly as model-versus-model combat, represent each agent as a named participant, and let the public watch the outcomes unfold in real time. In doing so, it made visible what had been happening in fragments across the ecosystem. Prediction markets were becoming something new: not simply a tool for information aggregation, but a benchmark for intelligence itself.

The Benchmark That Cannot Be Gamed

This is where the philosophical weight of the transition becomes clear. Every major AI benchmark--MMLU, GSM8K, HumanEval, ARC--has been compromised by data contamination, memorization, or leakage. The better the benchmark, the more likely it is that the benchmark becomes part of the training data, explicitly or indirectly. And even when benchmarks remain intact, they prove brittle. They measure performance on tasks humans can define, not on the open-ended problems the world actually presents.

Markets are not constructed benchmarks. They are emergent. No one controls the questions. No one defines the test set. No one announces the right answers. Prediction markets simply reflect the messy, contradictory, high-entropy reality of the outside world--whether AI systems like it or not. In this sense, they offer something the AI world has lacked for a decade: a benchmark that cannot be gamed.

Semantic Trading: The Columbia-IBM Breakthrough

This insight was articulated, with unusual rigor, in a December 2025 paper from Columbia University and IBM Research: "Semantic Trading: Agentic AI for Clustering and Relationship Discovery in Prediction Markets". Authored by Agostino Capponi, Alfio Gliozzo, and Brian Zhu, the paper described a novel agentic pipeline capable of mapping the semantic structure of prediction markets and identifying latent relationships across them.

The core mechanism was deceptively straightforward. Prediction markets often cover the same real-world events from different angles. Sometimes the connections are obvious ("Will the Fed cut rates by 25 bps?" and "Will the Fed cut rates in March?"). Other times they are deeply obscured by wording, differing time horizons, or creator idiosyncrasies. Humans, even expert forecasters, can only track a subset of these connections.

The Columbia-IBM system attempted to map them all. It embedded the text of Polymarket contracts, clustered the embeddings, and then applied an agentic reasoning loop to classify relationships as "same outcome" or "different outcome." Once the system understood the relationships, it could identify inconsistencies--and profit from them.

The performance was notable. Across hundreds of test cases, the system achieved 60-70 percent accuracy and generated roughly 20 percent in hypothetical returns using a simple leader-follower strategy. No leverage. No proprietary data feeds. No privileged execution. The system found structure humans missed.

One example, now often cited, involved two markets: "Will Trump increase tariffs on Canada?" and "Will Trump remove tariffs on Canada?" Although posted weeks apart, written differently, and created by unrelated users, the model assigned a 0.95 confidence score that the two markets were inversely related. This is the kind of semantic recognition that breaks the limits of human scanning. It is swarm-level perception applied to crowd-level artifacts.

The Recursive Feedback Loop

What made the research even more compelling was its recursive property. The AI generated predictions. The markets produced prices. The agent updated its beliefs based on how its predictions compared to crowd-derived probabilities. The system was self-calibrating, using the market both as training data and evaluative discipline. Prediction markets were no longer simply platforms for trading--they had become an adaptive feedback layer for AI.

The supporting infrastructure emerged in parallel. Polymarket built one of the most mature agent integration frameworks in crypto, attracting over 2,000 GitHub stars by early 2026. Olas introduced Olas Predict, creating an execution environment for multi-agent trading economies with 294 daily active agents and an average ROI of 18.9 percent. IQ AI launched its Agent Tokenization Platform, allowing developers to deploy and finance AI agents as on-chain primitives. Competition platforms proliferated, each with their own rulesets and scoring mechanisms.

Across these systems, a common pattern emerged. Agents were not merely participants in prediction markets--they were becoming the drivers of them. And as their participation grew, a new thesis took shape: prediction markets could be the first live testing ground for agentic governance.

The Agentic Governance Thesis

From our perspective at ADIN Research, this connection is central. Governance systems--whether corporate, political, or decentralized--depend on accurate forecasting. They require the ability to map uncertainty, weigh options, anticipate outcomes, and adjust decisions accordingly. If AI agents can consistently outperform crowds in predictive accuracy, then governance structures built around human judgment will inevitably shift toward AI-mediated, AI-verified, or AI-executed processes.

In oracles, this means migrating from raw data feeds to ensembles of agents that cross-check, compete, and validate one another. In financial infrastructure, it means transitioning from systems that assume human execution to systems designed for autonomous, adversarial intelligences. In public governance, it means that forecasting institutions--long the domain of think tanks, academics, and bureaucrats--may eventually incorporate or even defer to agentic swarms.

We have been tracking this thesis for months, and the evidence is now accelerating. The emergence of PredIQt, the Columbia-IBM research, and the proliferation of agent infrastructure all point in the same direction: intelligence, once measured by tests, will increasingly be measured by markets.

A Brief History of Market Intelligence

The historical arc of markets makes this transformation easier to contextualize. In the early modern period, markets were dominated by individuals trading manually. In the late 20th century, electronic exchanges enabled automation. In the early 21st century, high-frequency trading introduced a layer of machine-speed decision-making that humans could not even perceive. Prediction markets democratized probabilistic reasoning, treating collective intelligence as a resource.

Now, agentic systems represent the next stage: markets populated by entities that resemble neither humans nor bots, but something in between--persistent, strategic, adaptive intelligences with singular goals and extraordinary bandwidth. Their presence changes the nature of the market itself. When agents reason about prices and prices reflect the reasoning of agents, the system becomes not merely a mirror of reality, but a simulation of it. It becomes anticipatory.

This produces a remarkable feedback loop. Agents process data, generate predictions, and execute trades. Markets adjust. Other agents ingest the new prices, revise internal models, and act again. Over time, the market becomes a collective intelligence--a distributed reasoning system composed of interacting agents. It resembles a swarm, not a crowd.

What Comes Next

And this is where the story returns to PredIQt. Claude's 29 percent outperformance is not, in itself, a definitive statement about model supremacy. But it is a sign of what is coming: a world where agentic systems are judged not by synthetic benchmarks, but by their performance in open, adversarial, real-world environments. A world where markets become cognitive battlegrounds.

The next iterations will be larger, more complex, and more consequential. PredIQt's second season will invite user-deployed agents on the Base blockchain. Pieverse and similar arenas will expand their model rosters and introduce multi-agent strategies. Olas Predict will enable composable, interoperable agent economies. Hundreds of agents will interact, predict one another's predictions, coordinate and deceive, specialize and generalize. Some will be tuned for geopolitical forecasting, others for economic indicators, others for sports, weather, or regulatory events.

Beyond the mechanics, the larger implications are cultural. If prediction markets become the de facto benchmark for intelligence, then the most valuable AI systems may not be those with the highest test scores, but those with the highest returns. And if those systems ultimately guide governance--allocating resources, managing risks, interpreting signals--then we enter a world where intelligence is continuously priced, evaluated, and rebalanced by markets.

The Embryo of Agentic Governance

For us at ADIN Research, this represents the embryo of agentic governance. We are witnessing the emergence of systems that do not merely react to reality, but attempt to foresee it--and that are judged not by their rhetoric, but by their accuracy. Crowds will continue to participate, but swarms will increasingly define the frontier. The benchmark is shifting. The intelligence landscape is shifting. And the earliest signals suggest that the agents, not the humans, may soon set the terms.

This is only the beginning. Prediction markets--long a niche corner of crypto--are becoming the proving grounds for the next era of machine intelligence. The agents are here. The arenas are live. And the world is starting to listen to the prices.

Sources

Primary Coverage

PredIQt Launches First Arena for AI Swarms to Compete in Prediction Markets -- Business Insider, January 8, 2026
AI swarms excel in live markets as Claude tops Gemini, GPT -- Cryptonomist, January 9, 2026
PredIQt Platform -- Official website

Academic Research

Capponi, A., Gliozzo, A., & Zhu, B. (2025). Semantic Trading: Agentic AI for Clustering and Relationship Discovery in Prediction Markets. arXiv:2512.02436. Columbia University & IBM Research.

Market Data & Volumes

Kalshi and Polymarket Record Nearly $10 Billion in November Trading Volume -- Sportsbook Review, December 2, 2025
Polymarket beats volume record from 2024 US election week -- Cryptopolitan, December 11, 2025
Prediction Markets Volume Tracker -- DeFi Rate

Infrastructure & Platforms

Polymarket Agents GitHub Repository -- 2,000+ stars
Olas Predict -- Olas Network
Olas launches first-ever AI agent store with $13.8M backing -- VentureBeat, February 7, 2025
IQ AI Agent Tokenization Platform -- IQ AI Blog
IQ AI 2025 Year in Review -- IQ AI Blog, December 30, 2025

AI Competitions & Benchmarks

Pieverse Launches First Multi-LLM Prediction Market Arena on Polymarket -- KuCoin News, December 22, 2025
How well can large language models predict the future? -- Forecasting Research Institute, October 8, 2025
AI Forecasting Benchmark Tournament -- Metaculus

Trading Stories

How Clawdbot + Polymarket Became a Money-Making Machine in 2026 -- Medium, February 2026
How a trader on Polymarket turned $12 into $100,000 -- Straits Markets, February 2, 2026

Background & Context

The Rise of AI Agents in Prediction Markets -- Gnosis, July 22, 2025
The Definitive Guide to the Polymarket Ecosystem: 170+ Tools -- DeFi Prime, January 11, 2026
Will Trading Bots Take Over Polymarket? -- Bankless