ORO
SN15Subnet in development, details not yet published
AI shopping agents, evaluated on Bittensor. Miners write Python agents that search products, compare prices, apply vouchers, and make purchase recommendations. Validators run each agent in an isolated Docker sandbox against ShoppingBench, a benchmark with 2.5 million real products. The best agent earns the most emissions.
// AI shopping agents, benchmarked at scale.
ORO is a subnet that evaluates AI shopping agents. Miners build agents that can solve real shopping tasks: finding products, comparing options, applying discounts, and recommending the best purchase. These agents are tested against a massive catalog of 2.5 million real products in sandboxed environments.
The simple version: Imagine a competition where AI personal shoppers are tested on their ability to find you the best deal on any product. Each shopper has access to 2.5 million real products and must navigate search, comparison, and recommendation. The shopper that finds the best deals most consistently wins.
Centralized equivalent: Think Google Shopping AI or Amazon's recommendation engine, but the underlying agents are built through open competition and evaluated against a published academic benchmark.
How it works:
- Miners write Python agents that define an
agent_main()function. Inside the sandbox, agents can use tools:find_product,view_product_information, andrecommend_product. Agents are scored on ground truth accuracy, format compliance, and field matching. - Validators claim work from the backend, execute miner agents in Docker sandboxes, score results against ground truth, and set on-chain weights. Challengers must exceed a decaying score threshold to claim the top spot, preventing trivial improvements from churning the leader.
- The problem it solves: Online shopping is overwhelming. Millions of products, inconsistent pricing, complex voucher systems. AI agents that can navigate this complexity save consumers time and money.
- The opportunity: Global e-commerce exceeds $6 trillion annually. AI-assisted shopping is the next interface layer. The agent that can reliably find the best deal across all retailers captures enormous value.
- The Bittensor advantage: ShoppingBench (published on arXiv) provides a rigorous, reproducible evaluation framework. The decaying score threshold prevents gaming: you can't just copy the leader with a trivial improvement.
- Traction signals: Published research paper (arXiv:2508.04266). 2.5 million product benchmark. Dockerized sandboxed evaluation. Leaderboard live at oroagents.com. Built by Seth Schilbe (That Guy Wade). Newly registered with high root proportion (0.668), indicating early-stage growth.
Category: Search and Information Retrieval | Centralized Competitor: Google Shopping, Amazon Recommendations, Perplexity Shopping, ShopGPT
ORO takes an academic approach to a commercial problem. The ShoppingBench paper on arXiv signals scientific rigor that most Bittensor subnets lack. The 2.5 million product catalog ensures agents are tested at realistic scale, not on toy datasets.
Mechanism:
The evaluation pipeline is clean. Miners submit agents as Python code. Validators spin up Docker containers with the agent, the product catalog, and the shopping tools. The agent runs, produces recommendations, and validators score against ground truth. The decaying threshold mechanism is elegant: to take the top position, you must meaningfully improve on the current leader, not just make a marginal tweak.
The codebase has only 6 commits from a single contributor, reflecting a very new subnet. The project is freshly registered, with a root proportion of 0.668 (the highest in our coverage by far), meaning most of its pool comes from protocol subsidy rather than organic demand.
At 9,868 TAO market cap with 901 holders, ORO is one of the smallest subnets we've covered. However, the 30-day return of 30% and 7-day of 18% show strong early momentum. Net 7-day inflow of 291 TAO is significant relative to its size.
The academic foundation (arXiv paper, standardized benchmark) and practical utility (everyone shops online) make this one of the more compelling early-stage subnets. The question is whether the team can scale development beyond a single contributor.
- Very early stage: 6 commits, 1 contributor. The codebase is minimal for a live subnet.
- Highest root proportion: 0.668 means the subnet is heavily subsidized by protocol. Organic demand hasn't established yet.
- Single contributor: All development from one person creates maximum bus factor risk.
- Commercial gap: Building a great shopping agent in a sandbox is different from deploying one that works across real retail websites with dynamic pricing and inventory.