TrajectoryRL
SN11Makes AI agents cheaper, safer, and more reliable by optimising their decision-making paths
A subnet that optimizes how AI agents work. Miners compete to make agents cheaper, safer, and more reliable by optimizing their decision-making policies. One demonstration cut agent operating costs from $12,300/month to $900/month, a 93% reduction, through trajectory optimization alone.
// Making AI agents cheaper, safer, and smarter.
TrajectoryRL is a subnet where miners compete to optimize AI agent policies. When an AI agent performs a task (browsing the web, writing code, managing files), it makes a series of decisions called a "trajectory." TrajectoryRL rewards miners who find better trajectories: sequences of decisions that are faster, cheaper, and more reliable.
The simple version: Imagine an AI assistant that takes 20 steps to book a flight, costing $5 in API calls. A TrajectoryRL miner figures out how to do it in 6 steps for $0.30. The miner who finds the most efficient path wins.
Centralized equivalent: Think of it as automated consulting for AI operations. Companies like McKinsey optimize business processes; TrajectoryRL optimizes AI agent processes, but through competitive benchmarking rather than billable hours.
How it works:
- Miners upload "policy packs" containing optimized agent configurations (prompt engineering, multi-LLM routing, skill injection) to any public HTTP endpoint and commit metadata on-chain. No server required, no uptime needed.
- Validators evaluate policy packs using ClawBench, a deterministic scenario suite with fixed fixtures. Two-phase evaluation: Phase 1 checks pack integrity, Phase 2 scores trajectory quality using LLM-as-judge against natural language criteria. Winner-take-all with first-mover advantage and NCD similarity detection to prevent copying.
- The problem it solves: AI agents are expensive and unreliable. A single GPT-4 agent running 1,000 tasks/day costs $12,300/month. Most of that spend is wasted on inefficient decision paths.
- The opportunity: Every company deploying AI agents (customer support, code generation, data analysis) needs cost optimization. The ROI case is clear: TrajectoryRL demonstrations show 73-93% cost reductions.
- The Bittensor advantage: Winner-take-all with anti-copy protection (SHA256 hashing + NCD similarity + first-mover threshold) creates a genuine innovation race. The content-addressed pack system means no server infrastructure required from miners.
- Traction signals: 400 commits across 10 contributors with 52-88 commits per week. Highest development velocity in this batch. ClawBench evaluation framework. Multi-LLM routing (Qwen 3.5, GLM-5, Gemini 3 Flash, GPT-5.2) demonstrated in examples.
Category: Reinforcement Learning | Centralized Competitor: Consulting firms (McKinsey, BCG for process optimization), LangSmith (agent observability), AgentOps
TrajectoryRL addresses one of the most practical problems in AI: agents are expensive because they make bad decisions. Every unnecessary API call, every wrong tool selection, every redundant step costs money. TrajectoryRL's approach is to crowd-source optimization through competition.
Mechanism:
The two-stage optimization model is compelling. Stage 1 (prompt optimization) involves tuning AGENTS.md configurations and stop rules. The example shows this alone reduces costs from $12,300 to $3,300/month, a 73% cut. Stage 2 (hybrid routing) adds multi-LLM dynamic routing, where different sub-tasks are routed to the cheapest capable model: Qwen 3.5 handles 40% of tasks (tool calls, lookups), while GPT-5.2 handles only 10% (reasoning, drafting). This pushes costs down to $900/month, a 93% total reduction.
ClawBench provides deterministic evaluation. Scenarios have fixed fixtures, meaning every miner faces the same challenge. LLM-as-judge scoring against natural language criteria adds flexibility while maintaining consistency. The winner-take-all structure with first-mover advantage means you can't just copy the leader; you have to commit your innovation first to claim priority.
The codebase is the most active we've seen: 400 commits across 10 contributors, with 52-88 commits per week in the last month. Primary contributor "crabby" drives daily improvements. Recent work includes content truncation handling and ClawBench module fixes.
Market metrics are solid. At 58,197 TAO market cap with 2,136 holders, TrajectoryRL has moderate concentration (Gini 0.673, HHI 0.043). Net 7-day inflow of 858 TAO is healthy. Root proportion of 0.170 confirms organic demand. The 90-day price increase of 67% shows sustained interest.
- Benchmark overfitting: If miners optimize specifically for ClawBench scenarios rather than general agent optimization, the real-world value diminishes.
- Market education: "AI agent trajectory optimization" is a niche concept. Communicating the value proposition to a broad audience requires clear ROI demonstrations.
- Competitive landscape: Agent optimization frameworks are emerging from well-funded startups. LangSmith, AgentOps, and others are building observability and optimization tools.
- Winner-take-all dynamics: 100% to the winner may discourage new miners if the incumbent is difficult to beat.